Home / Blog / I Audited a Friend's AI API Bill and Found £1,200 ...

I Audited a Friend's AI API Bill and Found £1,200 Going Up in Smoke — Here's How to Avoid the Same T

By CaelLee | | 7 min read

I Audited a Friend's AI API Bill and Found £1,200 Going Up in Smoke — Here's How to Avoid the Same T

Last week I did a quick cost audit for a mate's startup. Three developers, one project, and a monthly bill of £1,200 for a large language model API. Three people. Meanwhile, the team next door — same usage patterns, different pricing model — was paying under £400.

That moment stung. Because I've been that person. Hell, most developers I know have been that person — blindly picking a pricing tier and hoping for the best.

Here's the thing: choosing between pay-as-you-go and subscription pricing for AI APIs isn't just a billing preference. It's the difference between buying a round of drinks and accidentally funding the entire pub.

Let's break this down properly.

TL;DR for the Impatient

The Two Models, Explained Without the Marketing Fluff

Pay-as-you-go is simple: you use tokens, you pay for tokens. GPT-4o currently runs about $2.50 per million input tokens and $10 per million output tokens. Most providers — OpenAI, Anthropic, Google — follow this pattern. You'll see charges per 1,000 or 1,000,000 tokens depending on the vendor.

Subscription means you pay a fixed monthly fee for a bucket of tokens. Something like: £20/month gets you 1 million tokens, and anything beyond that is billed at a discounted pay-as-you-go rate.

Looks straightforward, right?

Yeah. That's what I thought too.

Three Pricing Models, Three Different Horror Stories

1. The Debug Phase: Pay-As-You-Go Is a Money Incinerator

November 2023. I'm building a RAG pipeline — retrieval-augmented generation, for the uninitiated — and I need to iterate on prompts constantly. I hook up OpenAI's standard pay-as-you-go API because it's the path of least resistance.

Two weeks later: £150 gone.

Why? Because every debugging session runs the full context. My document chunks averaged 800 tokens. Add the system prompt, conversation history, and the user query, and each call hit 2,000+ tokens easily. I was tweaking prompt templates 40, 50 times a day.

I still remember the exact moment it clicked: 23 November, 11 PM. I'm trying to get a summarisation prompt to stop hallucinating dates. Forty-three iterations later, I go to bed frustrated. Next morning, the dashboard shows £29 evaporated overnight. For one evening of prompt engineering.

Here's the kicker: if I'd been on ChatGPT Plus (£20/month) and used the discounted API access that comes with it, I'd have saved roughly 60%. The per-token rate for Plus subscribers is significantly lower for certain models.

Wait — I should clarify that. Not all models get the Plus discount. GPT-4o does. The o1 series, from what I've seen, doesn't follow the same pricing structure. I was using GPT-4o, so the savings were real. If you're on o1 or o1-mini, check the pricing page separately. Don't assume.

Lesson learned: During development, a subscription with API credits is your friend. Raw pay-as-you-go during debugging is like leaving the tap running while you brush your teeth.

2. Stable Production: When Pay-As-You-Go Actually Wins

January this year, I built a customer service chatbot for a mid-sized e-commerce company. Around 2,000 conversations per day. We started with a subscription plan from a provider: £120/month for 500,000 tokens.

Two months in, I actually looked at the numbers — properly looked, with LangSmith traces and everything.

We were using 300,000 tokens. Maybe 320,000 in a busy month. The average conversation was short: "Where's my order?" "What's your return policy?" "Can I change the shipping address?" Four hundred tokens, tops, per interaction. We'd massively overestimated our usage when picking the plan.

We switched to pay-as-you-go. Same volume: £72/month. That's a 40% drop.

The general rule — and I'm hedging here because every workload is different — is that predictable, stable production traffic often leans towards pay-as-you-go. If your monthly tokens consistently fall below the subscription threshold, you're literally paying for nothing. Peak fluctuations complicate this, sure, but the logic holds.

3. The Hybrid Approach: What My Team Actually Uses Now

These days, I run a "subscription floor, pay-as-you-go ceiling" setup:

Last month, our five-person team spent £184 total on LLM APIs. Before the hybrid approach, with everything on pay-as-you-go? Over £400. I ran the numbers twice because I didn't believe the difference myself.

Real money. Not hypothetical blog-post money.

How to Calculate This for Your Own Team

I keep a stupidly simple spreadsheet. Here's what goes in it:

  1. Estimated monthly tokens (from historical data or a small-scale test)
  2. Pay-as-you-go unit price (split input and output — they're different)
  3. Subscription price and included token allowance
  4. Overage price per token

Then the formula is basic arithmetic:


Pay-as-you-go cost = estimated_tokens × unit_price
Subscription cost = monthly_fee + max(0, estimated_tokens - allowance) × overage_price

Let's run a concrete example. Suppose you're burning 800,000 tokens per month. Provider A charges £1.60 per million tokens on pay-as-you-go. Their subscription is £20/month with 500,000 tokens included, and overage is £1.20 per million tokens.

Sorry, I fumbled the units. Let me fix that:

Try again with actual market rates. Say it's £2 per 1,000 tokens for pay-as-you-go, and the subscription is £25/month with 50,000 tokens included, overage at £1.50 per 1,000 tokens. Monthly usage: 80,000 tokens.

That's a £90 difference. Nearly 60% cheaper on subscription.

I genuinely think most teams never run this calculation. They pick based on vibes. I did, for years.

The Hidden Costs Nobody Mentions

Rate limiting will ambush you. Many subscription plans throttle requests per second. We hit a limit last spring — peak traffic, users waiting eight seconds for replies. The error message is burned into my memory: ratelimitexceeded: please retry after 5s. Meanwhile, customers see a spinning wheel. We switched to a higher-tier pay-as-you-go endpoint with better concurrency. More expensive per token, but the user experience stopped bleeding.

Model lock-in is real. Subscriptions usually tie you to specific models. Need to switch between GPT-4o and Claude 3.5 Sonnet depending on the task? Pay-as-you-go gives you that flexibility. Most subscription plans I've seen restrict model choice unless you upgrade tiers.

Free credits are a trap. Providers love giving new users £5 or £10 in free credits. It's generous, sure. But the overage rates are often 30-50% higher than subscription pricing. I've watched teams coast on free credits for two months, then get a bill that doubles overnight. One team shared their invoice in our Slack group with a single emoji: 💀.

What I'd Recommend (Your Mileage May Vary)

The Honest Truth

Optimising AI API costs is, at its core, a data analysis problem. It's unglamorous spreadsheet work. But running the numbers can save you enough to buy a new laptop every few months. I'm not being metaphorical — last month's savings covered a new MacBook Air M3.

I'm curious: what pricing model is your team on? Have you been blindsided by a bill yet? Drop a comment — I genuinely want to know what traps other people are falling into.

AI #APICosts #PayAsYouGo #Subscription #DeveloperExperience #CostOptimisation #LLM

C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free