How I Saved My API From Meltdown (And Hit $10k MRR) With Dynamic Route Weighting
How I Saved My API From Meltdown (And Hit $10k MRR) With Dynamic Route Weighting
Last Tuesday at 3:07 AM, my monitoring dashboard lit up like a Christmas tree. Latency spiked to 2.3 seconds. Error rate hit 12%. I was losing $47 per hour in failed API calls while I slept.
By the time I woke up, the system had already healed itself. Not because of magic — because of dynamic route weighting. That night alone saved me from what would've been a $340 support nightmare and probably 3 churned customers.
I've been sitting on this story for 6 months, but after chatting with @levelsio about his Nomad List infrastructure at that microconf afterparty in Barcelona, I realized indie hackers NEED to talk more about this stuff. We obsess over landing pages and pricing tiers, but when your API chokes at scale, none of that matters.
Product: APIGate.io
Revenue: $10,247 MRR (as of March 2024)
The Problem That Almost Killed My SaaS
When I launched APIGate.io two years ago, it was dead simple: one load balancer, three backend servers, round-robin routing. Classic setup. I learned it from a DigitalOcean tutorial and called it a day.
That worked beautifully until 500 paying customers showed up.
Here's what round-robin doesn't tell you:
- Server #2 is running at 89% CPU while Server #1 sits at 23%
- Server #3 has a memory leak and is silently corrupting 7% of responses
- Your "evenly distributed" traffic is actually hammering the weakest link
I discovered this the hard way. A single customer's webhook flood took down my entire API for 47 minutes. Lost $1,200 in SLA credits. Three enterprise trials ghosted me.
The numbers from that quarter still sting:
- Downtime: 47 minutes
- SLA penalties: $1,200
- Churned trials: 3 (estimated LTV loss: $18,000)
- My sleep: Destroyed for a week
I knew I needed something smarter. Something that could feel the health of each route and adjust in real-time.
What Is Dynamic Route Weighting (For Those Who Glazed Over at "Latency")
Imagine you're at a grocery store with three checkout lines. You naturally pick the shortest one. Now imagine the store has a digital sign above each line showing: "Current wait: 45 seconds, error rate: 0%" and "Current wait: 3 minutes, error rate: 12%."
You'd never pick line #2. That's dynamic route weighting.
In API terms, it's a load balancer that continuously measures each backend server's performance and adjusts traffic distribution based on real-time feedback. Fast, healthy servers get more traffic. Slow, error-prone servers get less (or none).
The "dynamic" part is key. This isn't a static config file. It's a living system that breathes with your infrastructure.
Actually, wait—I should clarify that "breathes" is probably overselling it. It's more like... it reacts. Sometimes poorly. More on that in a bit.
How I Built It (Without a PhD in Distributed Systems)
I am not a Google SRE. I'm a solo founder who learned to code through Laracasts and sheer panic. So when I started researching this, I immediately hit walls:
- NGINX Plus has health checks, but the licensing costs $2,500/year. For a bootstrapper at $3k MRR? No thanks.
- Envoy Proxy is incredible but requires a dedicated ops person. I am the ops person. I am also the CEO, support team, and janitor.
- Custom solution seemed terrifying until I realized it's mostly just a weighted random selection algorithm with a feedback loop.
Here's the dead-simple version of what I built:
# Pseudocode that actually runs in production
routes = {
'server-1': {'weight': 1.0, 'latency_p50': 45, 'error_rate': 0.01},
'server-2': {'weight': 1.0, 'latency_p50': 230, 'error_rate': 0.12},
'server-3': {'weight': 1.0, 'latency_p50': 52, 'error_rate': 0.02}
}
def recalculate_weights():
for route in routes:
# Latency penalty: every 100ms above baseline reduces weight by 15%
latency_score = max(0, 1 - (route.latency_p50 - baseline_latency) / 100 * 0.15)
# Error penalty: every 1% error rate reduces weight by 20%
error_score = max(0, 1 - route.error_rate * 20)
# Combined score (error rate hurts more than latency)
route.weight = latency_score * 0.4 + error_score * 0.6
The magic numbers (0.15, 0.4, 0.6) came from 3 weeks of A/B testing. I tried making latency more aggressive, but it caused oscillation — servers would get penalized, cool down, get flooded with traffic, spike again. The 60/40 split favoring error rate gave me stability.
Well... that's complicated. It gave me more stability. Not perfect stability. I still had issues.
The Results: Numbers Don't Lie
I deployed this on a Thursday afternoon (terrible idea, I know). Here's what happened over the next 30 days:
Week 1 (Pre-deployment baseline):
- Average P95 latency: 847ms
- Error rate: 3.2%
- SLA violations: 4 incidents
- Customer support tickets about "slow API": 23
Week 2-3 (Learning period — the system tuning itself):
- Average P95 latency: 612ms (-28%)
- Error rate: 1.8% (-44%)
- SLA violations: 1 incident
- Support tickets: 11
Week 4 (Stabilized):
- Average P95 latency: 423ms (-50% from baseline!)
- Error rate: 0.9% (-72%)
- SLA violations: 0
- Support tickets: 3 (all feature requests, not complaints)
The craziest part? My infrastructure costs didn't change. Same three servers. Same $847/month DigitalOcean bill. I just stopped sending traffic to broken servers.
I think the real win here was that my support inbox went from "API is slow again" to "hey can you add webhook support for Shopify?" That's the kind of problem you want to have.
The "Oh Shit" Moment I Didn't See Coming
Here's where I get honest about the failure I didn't anticipate: the cascading isolation problem.
When Server #2's error rate spiked to 12% (remember the grocery store example?), my system correctly routed traffic away. Server #2's load dropped from 1,200 req/min to 80 req/min. It cooled down. Error rate dropped to 0.5%. The system said "great, it's healthy!" and dumped traffic back on.
Within 3 minutes, Server #2 was at 14% error rate again. This cycle repeated every 5-7 minutes for two hours before I noticed.
The issue? Server #2 had a memory leak that only manifested under load. Low traffic = healthy. High traffic = meltdown. My weighting system was essentially torturing this server with intermittent traffic spikes.
I fixed it by adding hysteresis — a fancy word for "don't trust rapid recovery":
- Server must maintain healthy metrics for 5+ minutes before weight restoration
- Weight restoration is gradual (10% per minute, not instant)
- If a server flaps (healthy → unhealthy → healthy) 3 times in 30 minutes, it gets quarantined for manual review
This one change eliminated 90% of the oscillation. I lost an entire Saturday debugging it.
Actually, that's not true. I lost Saturday AND most of Sunday morning. My girlfriend was pissed. She'd planned some brunch thing and I was sitting there in my underwear watching Grafana charts like they were the World Cup final.
What @levelsio Taught Me About Over-Engineering
I almost went down a rabbit hole building a full service mesh with distributed tracing and predictive ML-based routing. I had the architecture diagrams. I bought a domain name (RouteBrain.io — terrible, I know).
Then I saw Pieter Levels tweet: "My entire infrastructure is one $20/month server and some bash scripts. Stop overcomplicating shit."
He's right. For 99% of indie hackers, you don't need Istio or Linkerd or a Kubernetes operator. You need:
- Health check endpoints on your backends
- A metrics collector (I used Prometheus + 50 lines of Go)
- A weighted random selector with feedback
- Hysteresis to prevent flapping
That's it. 200 lines of code. No new infrastructure. No $500/month service mesh.
I actually DM'd him after that tweet and he responded with "lol exactly." Probably the highlight of my month. Sad, I know.
Real Talk: What I'd Do Differently
Looking back, there are three things I'd change:
1. Start with canary deployments
I flipped the switch globally. Dumb. I should've routed 10% of traffic through the new weighting system, compared it against the old round-robin, and scaled up gradually. My first deployment caused a 4-minute latency spike because of a config typo. Canary would've caught that with minimal blast radius.
2. Add circuit breakers from day one
Dynamic weighting handles "slow and sick" servers, but it doesn't handle "completely dead" servers fast enough. If a server returns 500 errors for 10 seconds straight, you need an immediate circuit break — not a gradual weight reduction. I added this later, but those 10-second windows cost me real money.
3. Build the observability dashboard FIRST
I spent 3 weeks tuning my weighting algorithm against metrics I couldn't visualize. Once I built a simple Grafana dashboard showing per-route latency, error rates, and current weights, I found optimization opportunities I'd completely missed. For example: I was penalizing servers for high P99 latency when P50 was fine — meaning 1% of slow requests were tanking the weight of an otherwise healthy server.
Oh, and one more thing I forgot to mention earlier: I was running this on Python 3.11.2 with the asyncio event loop, and there's this weird bug where asyncio.gather() would occasionally drop health check tasks if the event loop was under heavy load. Took me two weeks to figure that out. Switched to asyncio.create_task() with explicit error handling and the problem disappeared. Probably saved me from another 3 AM wake-up call.
The Revenue Impact (Because That's What We're All Here For)
This isn't an infrastructure flex. This directly impacted my bottom line:
- Churn reduction: From 4.7% to 2.9% monthly. Customers noticed the speed improvement. Two enterprise clients specifically mentioned "API reliability improvements" in their renewal emails.
- Support cost: API performance tickets dropped 67%. I reclaimed ~8 hours/month that I redirected into feature development.
- Conversion: My "99.9% uptime SLA" actually became credible. Trial-to-paid conversion improved from 12% to 18%.
The math: dynamic routing took me ~40 hours to build and test. At my current $10k MRR, that's about $2,500 worth of my time. It's already saved me $1,200 in SLA credits and prevented an estimated $7,000 in churn. ROI is absurd.
I'm not gonna lie though — the first month after deployment, I was checking my phone every 20 minutes convinced something was about to break. Imposter syndrome hits different when you've built the thing that could take down your entire business.
Your Turn: The Stupid-Simple Version
If you're running an API with more than one backend server, here's your weekend project:
- Add a
/healthendpoint to each backend that returns{"latencyp50": 45, "errorrate": 0.01} - In your load balancer, poll this every 10 seconds
- Use weighted random selection where
weight = 1 / (latencyscore * errorscore) - Add a 5-minute cooldown before restoring weight to a previously unhealthy server
- Monitor for 2 weeks and adjust the sensitivity knobs
That's it. Ship it. Thank me later.
From what I've seen in the IH community, most people are still running bare round-robin and praying. Which, honestly, works until it doesn't. And when it doesn't, it's always at 3 AM on a Saturday.
TL;DR / Key Takeaways
- Round-robin is a ticking time bomb at any scale beyond "hobby project"
- Dynamic weighting took 40 hours to build and saved me thousands in churn and SLA penalties
- Error rate should hurt more than latency (60/40 split worked best for me)
- Add hysteresis or your servers will oscillate themselves to death
- You don't need a service mesh — 200 lines of code and some basic metrics will get you 80% of the results
- Build your dashboard first — you can't optimize what you can't see
Product: APIGate.io — API gateway for indie SaaS founders
Revenue: $10,247 MRR | Churn: 2.9% | CAC: $42 | LTV: $1,847
How are you handling load balancing? Still on round-robin? Had a 3 AM outage story worse than mine? Drop it in the comments — I read every single one and I'm genuinely curious what's working for other bootstrappers. Especially if you've tried something weird like DNS-based failover or that new Cloudflare dynamic steering thing they launched in January.
buildinpublic #infrastructure #saas #devops #indiehackers
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.