How I Slashed My AI Bill by 73% (While Actually Improving Output Quality)

Last month, my OpenAI bill hit $4,837. I nearly spat out my coffee.

For context, that's more than my rent in Barcelona. Pieter Levels once tweeted that "AI costs are the new server costs," and honestly? He undersold it. They're worse — because at least server costs are predictable.

I'm bootstrapped. Every dollar matters. I can't just raise a Series A and burn cash like it's Monopoly money. So I went down a rabbit hole that I want to share with you today: deploying o4-mini in production for reasoning-heavy tasks without bankrupting myself.

Actually, wait — I should clarify something before diving in. When I say "reasoning-heavy," I don't mean AGI-level stuff. I'm talking about the kind of structured thinking where the model needs to figure out what goes where, spot logical gaps, and make editorial calls. Not rocket science. But surprisingly expensive when you're doing it 3,400 times a month.

The Problem I Was Solving

Product: ContentEngine AI

Revenue: $10,247 MRR

ContentEngine AI takes messy user input — half-baked blog outlines, rambling voice notes, scattered bullet points — and turns it into publish-ready content. The magic happens in the reasoning layer: figuring out structure, identifying gaps, making editorial decisions.

For months, I ran everything through GPT-4o. It worked beautifully. But at $15 per 1M input tokens and $60 per 1M output tokens, the maths was brutal.

My March numbers looked like this:

3,400 articles processed
Average 8,000 tokens per article (input + output)
Total cost: $4,837
Cost per article: $1.42

With my $29/month pricing tier, I was losing money on power users. Not sustainable.

I remember staring at my Stripe dashboard at 2am thinking... well, this is broken. One user submitted 87 articles in a single month. Great for engagement metrics, terrible for my bank account. They paid me $29. I paid OpenAI $123.54 for their usage alone.

Yeah.

The o4-mini Experiment

When OpenAI released o4-mini in April, I almost ignored it. "Mini" sounds like the cheap version, right? But then I read the benchmarks. The reasoning scores were competitive with full o3 on structured tasks. And the pricing? $1.10 per 1M input tokens, $4.40 per 1M output tokens.

I decided to run a two-week split test. Started around 8th April, I think.

Week 1: The Setup

I routed 50% of articles through my existing GPT-4o pipeline and 50% through o4-mini. Same prompts, same temperature settings (0.3), same everything. The only difference was the model.

I tracked three metrics:

Output quality (rated by users on a 1-5 scale)
Latency (time to generate)
Cost per article

Here's what happened:

Metric	GPT-4o	o4-mini

Avg quality score	4.2/5	4.1/5

Avg latency	3.8s	2.1s

I stared at that cost column for a solid minute. $0.38 vs $1.42. That's a 73% reduction. And the quality difference? Statistically insignificant in my sample size.

I literally got up and walked around my flat. Made another coffee. Came back and checked the numbers again.

Still $0.38.

Week 2: The Optimisation

But I didn't stop there. I noticed o4-mini was actually faster at reasoning tasks, which opened up some interesting optimisation opportunities.

Optimisation #1: Chain-of-Thought Caching

o4-mini's reasoning tokens are visible (unlike some models that hide them). I realised I could cache common reasoning patterns. For example, when structuring a "how-to" article, the model follows similar logical steps 80% of the time.

I built a simple caching layer that stores reasoning chains for common content types. When a new request matches a cached pattern, we skip the reasoning step entirely and jump straight to generation.

Well... "simple" is generous. The first version was a mess. I tried using pgvector with cosine similarity and spent three days tweaking the threshold. At one point I had a bug where it was caching empty reasoning chains and serving blank structures to users. Got a few angry emails that day.

Result: Another 22% cost reduction on cached requests.

Optimisation #2: Dynamic Model Routing

Not every article needs heavy reasoning. A simple product announcement doesn't need the same depth as a technical tutorial. I built a classifier (using a cheap, fast model — Claude Haiku, $0.25/1M tokens) that scores the complexity of each request.

Complexity score 1-3: Direct to o4-mini with minimal reasoning effort
Complexity score 4-7: Standard o4-mini reasoning
Complexity score 8-10: Full reasoning chain with o4-mini

Result: 35% of requests now use the "lite" reasoning path, costing an average of $0.12 per article.

The classifier itself? It's literally just a prompt that says "rate the complexity of this content request from 1-10" and parses the integer from the response. Took 20 minutes to build. Sometimes the simplest thing works.

My Current Stack & Costs

After a month of optimisation, here's where I landed:

May 2025 numbers:

3,600 articles processed
Total AI cost: $1,247
Cost per article: $0.35
Monthly savings: $3,590

That's $43,000/year back in my pocket. For a bootstrapped founder, that's a part-time hire or a serious marketing budget.

My current stack:

Complexity classifier: Claude Haiku
Reasoning engine: o4-mini (with caching layer)
Fallback for edge cases: GPT-4o (about 3% of requests)

I still keep GPT-4o around for the weird stuff. Had a user submit a 4,000-word stream-of-consciousness voice note last week that was basically half personal therapy session and half business idea. The classifier gave it a 9.7. o4-mini handled it fine, but I route those edge cases to 4o just to be safe. At 3% of volume, it barely moves the needle on cost.

What I'd Do Differently

Looking back, I made two mistakes:

1. I waited too long to optimise. I saw my AI costs climbing for three months before taking action. Classic founder move — focusing on growth while ignoring margin erosion. If I'd switched to o4-mini in March, I would've saved an extra $2,400.

I think part of me was afraid of touching the thing that was working. Stupid, I know. But when your product's quality depends on AI output, messing with the model feels like performing open-heart surgery on your only patient.

2. I over-engineered the caching at first. My initial caching system tried to match reasoning patterns with 95% similarity. It was too strict, and the cache hit rate was only 12%. When I loosened it to 80% similarity, the hit rate jumped to 45% with no quality degradation. Sometimes "good enough" is actually good enough.

I spent two weeks building this elaborate vector search system with metadata filtering and A/B testing infrastructure. Felt very clever. Totally unnecessary. The 80% threshold with basic cosine similarity works just as well. Ship the simple version first, kids.

The Bigger Picture

I've been thinking a lot about what Marc Lou said on a podcast recently: "The best AI startups won't be the ones with the best models — they'll be the ones with the best cost structures."

He's right. When everyone has access to the same models, your competitive advantage isn't the AI itself. It's how efficiently you deploy it.

o4-mini isn't the flashiest model. It won't write you a sonnet about your cat or generate a photorealistic image of a dinosaur wearing a hoodie. But for structured reasoning tasks in production? It's a workhorse. And at 1/15th the cost of GPT-4o, it's the kind of margin-improver that bootstrappers dream about.

I've seen some folks on IndieHackers arguing that o4-mini is "too dumb" for production use. Honestly? I think they're probably using it wrong. If you're expecting it to do open-ended creative work, yeah, you'll be disappointed. But for structured reasoning with clear constraints? It punches way above its weight class.

Key Takeaways

o4-mini matched GPT-4o quality on structured reasoning tasks at 73% lower cost
Cache common reasoning patterns — most content follows predictable structures
Route simple requests to "lite" processing — not everything needs deep reasoning
Start simple, then optimise — my fancy vector search was overkill; basic cosine similarity worked fine
Don't wait to cut costs — three months of procrastination cost me $2,400

Your Turn

I'm curious — what models are you running in production? Have you tested o4-mini yet? I keep hearing mixed things about its creative writing capabilities, but for reasoning, it's been solid for me.

Drop your experience in the comments. I read every single one (seriously, it's my favourite part of writing here).

And if you're wrestling with AI costs, feel free to share your numbers. We're all figuring this out together. My DMs are open too if you want to compare notes on caching strategies — I'm @emma_builds on Twitter.

Building in public, one cost-optimised API call at a time.

— Emma

p.s. If anyone from OpenAI is reading this, please don't nerf o4-mini's pricing. I just got my margins to a happy place. 🙏

buildinpublic #ai #bootstrapping #saas #openai #costoptimisation

Cost per article	$1.42	$0.38

How I Slashed My AI Bill by 73% (While Actually Improving Output Quality)

How I Slashed My AI Bill by 73% (While Actually Improving Output Quality)

The Problem I Was Solving

The o4-mini Experiment

Week 1: The Setup

Week 2: The Optimisation

My Current Stack & Costs

What I'd Do Differently

The Bigger Picture

Key Takeaways

Your Turn

buildinpublic #ai #bootstrapping #saas #openai #costoptimisation

Cael Lee

Ready to get started?