I Replaced OpenAI With Open-Source LLMs in My $10K MRR SaaS — Here's What Happened to My Costs
I Replaced OpenAI With Open-Source LLMs in My $10K MRR SaaS — Here's What Happened to My Costs
Last Tuesday at 3 AM, I was staring at my AWS bill with that sinking feeling you get when numbers don't add up. My OpenAI API costs had hit $2,847 for the month — nearly 30% of my revenue. I remember thinking: Pieter Levels built Nomad List on a shoestring. Why am I bleeding cash to Sam Altman?
That night, I did something that either makes me brilliant or completely insane. I decided to rip out OpenAI entirely and rebuild my stack on open-source models.
Here's what happened over the next 90 days, the real numbers, and my honest take on whether open-source LLM tooling can actually compete by 2025.
TL;DR for the Skimmers
- Costs dropped 60% ($2,847 → $1,124/month)
- Response times got weird (slower on Mistral, absurdly fast on Groq)
- Quality dipped slightly (8.2 → 7.8 out of 10)
- I spent 120 hours building this ($18K in opportunity cost)
- Was it worth it? Ask me in April 2025
The Breaking Point: When Your "Partner" Becomes Your Biggest Expense
Let me back up. ContentAI helps marketers generate blog posts, social copy, and email sequences. We process about 45,000 API calls per day. When I launched in January 2023, GPT-3.5-turbo was cheap — $0.002 per 1K tokens. My monthly LLM bill was $340.
Fast forward to mid-2024. Customers wanted GPT-4 quality. My costs per request tripled. Then came the rate limits, the random deprecations, the "we're updating our pricing model" emails that made my stomach drop.
I'm bootstrapped. No VC safety net. Every dollar in API costs is a dollar I can't spend on customer acquisition (my CAC is already $34 via content marketing, up from $22 last year).
So I asked the question every indie hacker eventually faces: Can I replace this dependency with something I control?
Actually, wait—I should clarify something. When I say "I asked this question," what I really mean is I complained about it on Twitter for three weeks straight until enough people called me out for not doing anything about it. Shoutout to @levelsio for the tough love.
The Stack I Built (And Almost Abandoned Twice)
I didn't go full cowboy and self-host a 70B parameter model on a Raspberry Pi. I needed something production-ready. Here's what I pieced together:
Model Layer:
- Mistral-7B (fine-tuned on my customer data) via Together AI for standard tasks
- Llama 3.1 70B via Groq for complex reasoning (their inference speed is absurd — 300 tokens/second)
- Self-hosted vLLM on a dedicated GPU instance for my highest-volume endpoint
Tooling Stack:
- LangChain → LlamaIndex (switched after LangChain's abstraction hell)
- Helicone for observability (same tool I used with OpenAI, works with any model)
- BentoML for model packaging and deployment
- LiteLLM as a proxy so I could swap models without changing application code
The first week was brutal. I documented my failures in real-time on Twitter, and indie hacker Marc Lou DM'd me saying "bro you're gonna churn half your users."
He wasn't entirely wrong.
I almost quit twice. Once at 2 AM when my fine-tuned Mistral model started outputting what I can only describe as "Shakespearean marketing copy" for a B2B SaaS client. The CTO emailed me asking if this was a feature. It was not a feature.
The second time was when I realized I'd hardcoded the OpenAI model name in 47 different places across my codebase. Forty. Seven. I just sat there laughing at my own stupidity for a solid five minutes.
The Numbers: 90 Days In
Here's the data I promised. No rounding, no "trust me bro."
| Metric | OpenAI Stack (June 2024) | Open-Source Stack (Sept 2024) |
|---|
| Monthly LLM Cost | $2,847 | $1,124 |
|---|
| Avg Response Time | 1.8s | 2.3s (Mistral) / 0.9s (Groq) |
|---|
| Output Quality Score* | 8.2/10 | 7.8/10 |
|---|
| Customer Churn | 4.1% | 4.7% |
|---|
| API Downtime (monthly) | 47 minutes | 112 minutes |
|---|
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.