| Cost per article | $1.42 | $0.38 |
I stared at that cost column for a solid minute. $0.38 vs $1.42. That's a 73% reduction. And the quality difference? Statistically insignificant in my sample size.
I literally got up and walked around my flat. Made another coffee. Came back and checked the numbers again.
Still $0.38.
Week 2: The Optimisation
But I didn't stop there. I noticed o4-mini was actually faster at reasoning tasks, which opened up some interesting optimisation opportunities.
Optimisation #1: Chain-of-Thought Caching
o4-mini's reasoning tokens are visible (unlike some models that hide them). I realised I could cache common reasoning patterns. For example, when structuring a "how-to" article, the model follows similar logical steps 80% of the time.
I built a simple caching layer that stores reasoning chains for common content types. When a new request matches a cached pattern, we skip the reasoning step entirely and jump straight to generation.
Well... "simple" is generous. The first version was a mess. I tried using pgvector with cosine similarity and spent three days tweaking the threshold. At one point I had a bug where it was caching empty reasoning chains and serving blank structures to users. Got a few angry emails that day.
Result: Another 22% cost reduction on cached requests.
Optimisation #2: Dynamic Model Routing
Not every article needs heavy reasoning. A simple product announcement doesn't need the same depth as a technical tutorial. I built a classifier (using a cheap, fast model — Claude Haiku, $0.25/1M tokens) that scores the complexity of each request.
- Complexity score 1-3: Direct to o4-mini with minimal reasoning effort
- Complexity score 4-7: Standard o4-mini reasoning
- Complexity score 8-10: Full reasoning chain with o4-mini
Result: 35% of requests now use the "lite" reasoning path, costing an average of $0.12 per article.
The classifier itself? It's literally just a prompt that says "rate the complexity of this content request from 1-10" and parses the integer from the response. Took 20 minutes to build. Sometimes the simplest thing works.
My Current Stack & Costs
After a month of optimisation, here's where I landed:
May 2025 numbers:
- 3,600 articles processed
- Total AI cost: $1,247
- Cost per article: $0.35
- Monthly savings: $3,590
That's $43,000/year back in my pocket. For a bootstrapped founder, that's a part-time hire or a serious marketing budget.
My current stack:
- Complexity classifier: Claude Haiku
- Reasoning engine: o4-mini (with caching layer)
- Fallback for edge cases: GPT-4o (about 3% of requests)
I still keep GPT-4o around for the weird stuff. Had a user submit a 4,000-word stream-of-consciousness voice note last week that was basically half personal therapy session and half business idea. The classifier gave it a 9.7. o4-mini handled it fine, but I route those edge cases to 4o just to be safe. At 3% of volume, it barely moves the needle on cost.
What I'd Do Differently
Looking back, I made two mistakes:
1. I waited too long to optimise. I saw my AI costs climbing for three months before taking action. Classic founder move — focusing on growth while ignoring margin erosion. If I'd switched to o4-mini in March, I would've saved an extra $2,400.
I think part of me was afraid of touching the thing that was working. Stupid, I know. But when your product's quality depends on AI output, messing with the model feels like performing open-heart surgery on your only patient.
2. I over-engineered the caching at first. My initial caching system tried to match reasoning patterns with 95% similarity. It was too strict, and the cache hit rate was only 12%. When I loosened it to 80% similarity, the hit rate jumped to 45% with no quality degradation. Sometimes "good enough" is actually good enough.
I spent two weeks building this elaborate vector search system with metadata filtering and A/B testing infrastructure. Felt very clever. Totally unnecessary. The 80% threshold with basic cosine similarity works just as well. Ship the simple version first, kids.
The Bigger Picture
I've been thinking a lot about what Marc Lou said on a podcast recently: "The best AI startups won't be the ones with the best models — they'll be the ones with the best cost structures."
He's right. When everyone has access to the same models, your competitive advantage isn't the AI itself. It's how efficiently you deploy it.
o4-mini isn't the flashiest model. It won't write you a sonnet about your cat or generate a photorealistic image of a dinosaur wearing a hoodie. But for structured reasoning tasks in production? It's a workhorse. And at 1/15th the cost of GPT-4o, it's the kind of margin-improver that bootstrappers dream about.
I've seen some folks on IndieHackers arguing that o4-mini is "too dumb" for production use. Honestly? I think they're probably using it wrong. If you're expecting it to do open-ended creative work, yeah, you'll be disappointed. But for structured reasoning with clear constraints? It punches way above its weight class.
Key Takeaways
- o4-mini matched GPT-4o quality on structured reasoning tasks at 73% lower cost
- Cache common reasoning patterns — most content follows predictable structures
- Route simple requests to "lite" processing — not everything needs deep reasoning
- Start simple, then optimise — my fancy vector search was overkill; basic cosine similarity worked fine
- Don't wait to cut costs — three months of procrastination cost me $2,400
Your Turn
I'm curious — what models are you running in production? Have you tested o4-mini yet? I keep hearing mixed things about its creative writing capabilities, but for reasoning, it's been solid for me.
Drop your experience in the comments. I read every single one (seriously, it's my favourite part of writing here).
And if you're wrestling with AI costs, feel free to share your numbers. We're all figuring this out together. My DMs are open too if you want to compare notes on caching strategies — I'm @emma_builds on Twitter.
Building in public, one cost-optimised API call at a time.
— Emma
p.s. If anyone from OpenAI is reading this, please don't nerf o4-mini's pricing. I just got my margins to a happy place. 🙏
buildinpublic #ai #bootstrapping #saas #openai #costoptimisation