The £870 Mistake: Pay-as-You-Go vs Subscription for AI APIs, Solved with Real Numbers
The £870 Mistake: Pay-as-You-Go vs Subscription for AI APIs, Solved with Real Numbers
Last month, our GPT-4 bill hit £870. My boss dropped three question marks in the group chat. Not words. Just "???".
Meanwhile, the team next door is on a £200/month subscription plan that throttles them into oblivion during peak hours. I can literally hear them swearing through the wall.
So I did what any developer with a bruised ego and a bruised budget would do — I went down a rabbit hole to answer one question: Should you pay per token or buy a subscription for AI APIs?
Here's the short answer: There's no silver bullet, but most teams are haemorrhaging cash because they picked wrong.
My Billing Nightmare, Charted
Let me show you our spending curve over the past six months:
- March (just launched): ~200 calls/day, £95 bill. Fine.
- April (started promoting): ~800 calls/day, £340. Getting uncomfortable.
- May (ran a viral campaign): peaked at 5,000 calls in a single day, closed at £870.
The problem? We were on pay-as-you-go, and traffic was about as predictable as British weather in April.
Our marketing team decided to run an "AI Year-in-Review" campaign — you know, users upload their data, we call GPT-4 to generate a shiny report, everyone shares it on social media. It went properly viral. 3,000 new users in 24 hours. I got paged at 2 AM, watching costs climb like a taxi meter in central London. It felt like leaving the immersion heater on before going on holiday.
I crunched the numbers later. If we'd been on a subscription plan — £200/month for 50,000 calls, with overage at a tiered rate — that month would've cost us about £400. That's 40% cheaper than what we paid.
But here's the plot twist.
I Tested Subscription Plans So You Don't Have To
If you're thinking "right, subscription it is" — hold that thought.
We have another project, an internal analytics tool, that signed up for a yearly subscription at £2,400. Unlimited calls. Sounds like a bargain, yeah?
Three months in, we noticed something odd. API responses were getting slower. And slower.
I pinged their support team. Their exact words: "Subscription users have lower resource priority than pay-as-you-go users during peak loads." Translation: You queue. Big spenders skip the line.
Oh, and that "unlimited calls" promise? There's a hidden QPM (queries per minute) cap. We discovered this when our batch processing job went from 10 minutes to... wait for it... two hours.
Actually, let me correct myself. It wasn't exactly hidden. I later found it on page 47 of their documentation — subscription users default to 60 QPM, and if you want more, you need to submit a ticket. Which takes 3-5 business days. And requires you to explain your use case. It's one of those things where they technically tell you, but they make it just inconvenient enough that you won't bother.
We switched back to pay-as-you-go for that project. Costs about £40 more per month, but at least nobody's throttling us at 3 PM on a Tuesday.
Three Teams, Three Strategies (Real Data)
I talked to a few friends running AI products. Here's what they're doing (anonymised, but the numbers are real):
Case 1: AI Customer Support Startup
- Call volume: 3,000-5,000/day, quite stable
- Models: Mostly GPT-3.5, occasional GPT-4
- Choice: Subscription, £400/month
- If pay-as-you-go: ~£560-720/month
- Savings: ~35%
Case 2: Solo Developer, Writing Tool
- Call volume: Erratic. Some days zero calls, others 200+
- Models: Claude 3 Sonnet
- Choice: Pay-as-you-go
- Average spend: £25-£65/month
- If subscription: Cheapest plan is £120/month
- No contest — pay-as-you-go wins
Case 3: Mid-size SaaS, Code Review
- Call volume: Spikes on workdays, flat on weekends
- Models: Heavy GPT-4 usage
- Choice: Base subscription (covers 60% of volume) + pay-as-you-go overflow
- Average spend: £950/month
- If pure pay-as-you-go: ~£1,430
- If pure subscription (higher tier): £1,200 but throttling risk
- Hybrid is the clear winner
Case 3 is surprisingly common, actually. From what I've seen, most B2B SaaS teams have this exact usage pattern — Monday to Friday daytime peaks, evenings and weekends drop off a cliff. Pure subscription can't cover the peaks, pure pay-as-you-go burns money on the troughs.
The Decision Framework I Actually Use
After all these scars, here's how I choose billing models now. Three questions:
1. How predictable is your traffic?
- Stable → Subscription
- Spiky → Pay-as-you-go or hybrid
2. How latency-sensitive are you?
- Real-time (chatbots, live tools) → Pay-as-you-go. Don't cheap out here.
- Async (reports, batch jobs) → Subscription is fine, queue away
3. How do you manage budgets?
- Need strict cost caps → Subscription with auto-throttle
- Can handle variable spend → Pay-as-you-go, but set alerts
Seriously. Set the alerts.
After our May disaster, I built a Grafana dashboard with two thresholds: daily spend over £16 triggers a Slack notification, over £40 automatically switches to a backup model. Haven't had a surprise since.
The Multi-Provider Trick Nobody Talks About
Here's a slightly unconventional approach that's working brilliantly for us.
We now use three providers simultaneously:
- Provider A subscription (cheap, baseline)
- Provider B pay-as-you-go (fast, overflow during peaks)
- Provider C (disaster recovery, barely used)
The routing middleware is about 200 lines of Python, built on top of litellm. Litellm added proper fallback support in October 2024, but their default round-robin strategy is a bit naive. I added two rules:
# Simplified version of our routing logic
def route_request(model, estimated_tokens):
# Rule 1: Prefer subscription provider if QPM headroom exists
if provider_a.qpm_remaining > 0.3 * provider_a.qpm_limit:
return provider_a
# Rule 2: Overflow to pay-as-you-go
if provider_b.qpm_remaining > 0.1 * provider_b.qpm_limit:
return provider_b
# Rule 3: Fallback to disaster recovery
return provider_c
This setup keeps our monthly costs under £400 — about 40% less than when we used a single provider. I've put the full config on GitHub (link in the comments).
The Uncomfortable Truth About AI API Pricing
Here's something I've noticed — and I'll probably get some angry emails for this — but most AI providers are playing a game of information asymmetry.
Pay-as-you-go pricing is extortionate because they know you're not tracking costs in real-time. It's the API equivalent of a minibar — you don't realise how much you've consumed until checkout.
Subscription plans look like a deal, but the SLAs are vague, and the throttling rules are buried on page 47 of the docs. It's all designed to make comparison shopping as painful as possible.
My advice? Don't trust their pricing pages. Run your own one-week stress test first.
Start with pay-as-you-go for a week. Collect real data — call frequency, token distribution, peak hours. Then take that data to their sales team and negotiate. Everything is negotiable, especially subscription pricing.
We showed one provider our usage pattern — proving our peak was at 3 AM UTC, not competing with their enterprise customers — and they dropped the subscription price from £240 to £145. Just like that.
Oh, and Anthropic cut Claude 3.5 Sonnet's pay-as-you-go pricing by 25% in January 2025. I suspect other providers will follow. Competition is fierce, and frankly, it's about time.
Key Takeaways
- Stable traffic + low latency needs → Subscription
- Spiky traffic + real-time requirements → Pay-as-you-go
- The sweet spot for most teams → Hybrid (subscription base + pay-as-you-go overflow)
- Always set cost alerts — Grafana, Datadog, whatever, just do it
- Multi-provider routing can save 30-40% if you're willing to write ~200 lines of middleware
What's Your Setup?
Are you on pay-as-you-go or subscription? Ever been throttled at the worst possible moment? Drop a comment — I genuinely want to know if other teams have had the same experience.
Edit: Didn't expect this many people asking about the routing middleware. I'll clean up the code and publish it next week. Hit follow if you want the update.
ai #api #costoptimization #devops #gpt4 #startuplessons
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.