Token-Based AI Pricing Is Broken — Here's What Comes Next
Token-Based AI Pricing Is Broken — Here's What Comes Next
Last November, I did something monumentally stupid.
My team was haemorrhaging cash on GPT-4 API calls, so I slashed our context window from 32K tokens to 8K. Seemed clever at the time. The result? A critical business workflow's accuracy plummeted from 87% to 61%. At 3:07 AM, staring at my Grafana dashboard with the cold realisation that I'd just broken production to save a few quid, I asked myself: are we paying for intelligence, or are we just paying for characters?
Brutal.
If you've integrated any AI API in the past year, you know this pain intimately. Token-based billing is a black hole — you genuinely never know what next month's invoice will look like. The irony is almost too perfect: we stuff our prompts with "please be concise" and then immediately add "please provide detailed explanations," ping-ponging between cost and quality like a desperate tennis match.
I was having drinks last month with a mate who runs an AI customer support startup. He told me about a client who suddenly uploaded a 200-page PDF for analysis. Single API call went from $0.03 to $4.70. They'd quoted the client a fixed monthly fee.
Do the maths on that one.
The Original Sin of Token Pricing
Let's rewind to first principles. Token pricing became the standard because it's brutally simple — every token the model processes consumes compute, and compute costs money. Technically, this logic is sound. But it creates a perverse incentive: the API provider maximises revenue when you maximise token consumption, not when you maximise value.
I ran an internal analysis during my time at Stripe last year (Q2, about 20 mid-sized SaaS products — small sample, take it with a grain of salt). What I found was properly shocking: roughly 43% of GPT-4 calls were essentially "waste inference." User says "hello," model responds with a full autobiography. Poorly designed prompts causing the model to regurgitate similar content repeatedly. Tokens burning, zero value created, and the API provider happily collecting the cheque.
Actually — let me correct myself. I reran those numbers in Q4 with a broader dataset (60+ products), and the figure dropped to around 37%. Still not rigorous enough for a paper, but the trend is unmistakable: waste inference is a massive, silent tax.
It reminds me of the early mobile internet days when carriers charged by the megabyte. You'd load a webpage and get billed for a dozen ad banners you never even looked at. Same energy.
From Selling Compute to Selling Outcomes
The core idea behind task-based pricing is almost embarrassingly simple: don't pay for the journey, pay for the destination. This isn't revolutionary — cloud computing's evolution from IaaS to SaaS followed the exact same arc. But AI has been missing a crucial prerequisite: we need to agree on what a "task" actually is.
Here's a proper failure story for you. Mid-2023, my team tried using GPT-4 for contract review. We defined a task: "Identify risky clauses in the contract." Crystal clear, right?
First test run, the model flagged 47 "risky clauses." Including "This agreement shall be governed by the laws of California" — which, in our business context, wasn't remotely a risk. We'd completely failed to realise that task definition isn't a technical problem. It's a business problem. It requires context, standards, and alignment with what the customer actually cares about.
This is precisely why task pricing hasn't gone mainstream. Tokens are crude but objective. Tasks are sensible but subjective. Until recently, that is.
Three Shifts Happening Right Now
The first shift comes from Anthropic. Their November 2024 enterprise API update introduced a "tool use" pricing dimension. When Claude calls an external tool — search, calculator, whatever — there's a fixed per-call charge instead of token metering. It's a clever compromise. Tool calls have clear boundaries and obvious value. Users happily pay for "the model decided to use the calculator" as an action, regardless of how many tokens that decision consumed. I've been testing it — roughly $0.02 per tool call, irrespective of downstream token generation.
The second shift is from a startup called CodeRabbit that does AI code review. They charge per pull request reviewed. Full stop. Doesn't matter how many tokens the underlying model burns through. I grabbed their CTO for a chat, and he said something that's stuck with me: "Customers are buying bug detection, not tokens. If we review 100 lines of code and only find one issue, that's our model being inefficient — the customer shouldn't foot that bill."
This pricing model fundamentally changes their incentives. They now aggressively optimise prompts, build caching layers, and even train specialised smaller models — all things that token pricing actively discourages. They're running a hybrid of GPT-4 and Claude 3.5 Sonnet under the hood, but their users neither know nor care.
The third shift is properly radical. I'm currently working with an AI recruitment team that's experimenting with "pay per successful hire" pricing. They only charge when an AI-screened candidate gets hired and passes probation. This isn't task pricing anymore — it's outcome pricing. While too extreme for most use cases, it reveals where we're heading: pricing models migrating from the supply side (compute cost) to the demand side (business value).
I was in a meeting with them last week, and their CTO cracked me up: "We're now more stressed about candidates passing probation than the hiring managers are." That's risk transfer from buyer to seller in action.
What the Endgame Might Look Like
If I were to map out AI API pricing for the next 3-5 years, I'd bet on a three-layer structure:
The bottom layer is resource pricing — tokens or compute-time billing, sticking around for model training, fine-tuning, and large-scale batch processing. Think of it like AWS EC2: granular control for those who need it.
The middle layer is task pricing, and this becomes the mainstream. API providers will pre-define standard tasks — text classification, entity extraction, summarisation, code generation — each with a fixed price and potentially SLA-backed (e.g., accuracy ≥ 95%). Developers stop caring about which model is running or how many tokens it's consuming. They only care about task completion quality. From what I hear, OpenAI is already testing something along these lines internally, though release timelines are anyone's guess.
The top layer is outcome pricing, targeting specific verticals. E-commerce conversion rate improvement. Customer support resolution rate. Recruitment qualified-candidate ratio. Pricing ties directly to business metrics, and the API provider essentially becomes an "AI outsourcing service" that absorbs performance risk.
These layers won't replace each other — they'll coexist. Just like today you can use S3 for raw storage (resource layer), Firebase for abstracted backend services (task layer), or Stripe Atlas for complete business incorporation (outcome layer). Different abstractions for different needs and risk appetites.
What This Means for Developers
If this prediction holds, what changes for those of us actually writing the code?
First, prompt engineering shifts from cost-cutting to quality improvement. Today, teams obsess over prompt optimisation to reduce token consumption. When pricing switches to tasks, the optimisation target becomes task success rate. Much healthier incentive. My own team has already flipped our KPIs — we now measure "task success rate" instead of "token savings." The conversations have completely changed.
Second, model selection becomes genuinely flexible. Under token pricing, choosing a more expensive model means higher cost risk. Under task pricing, the API provider can swap models behind the scenes — big models for complex cases, small models for simple ones — as long as task quality stays consistent. This is transparent to developers. No more agonising over "should this endpoint use GPT-4 or Claude 3.5?"
Third, application architecture evolves. Most AI apps today are "thin UI layer + one massive prompt" because every call burns money and multi-step reasoning feels financially reckless. When task pricing takes hold, we can design richer agent workflows — task decomposition, multi-step verification, self-correction — because cost no longer scales linearly with step count. I'm currently refactoring a project, splitting a 2,000-token monster prompt into five collaborating agents, each doing exactly one thing. Token consumption actually went up 30%, but under task pricing, the cost would be fixed.
Don't Celebrate Just Yet
Task pricing has its own demons. The biggest challenge: who gets to define things?
What counts as "one text classification"? If the API provider defines it too broadly, developers feel gouged. Too granular, and the simplicity vanishes. This is fundamentally a standardisation problem — it needs industry consensus, or a couple of dominant players to force it through with market share.
Then there's the gaming problem. If you charge per "resolved customer query," what stops the provider from marking easy questions as resolved and ignoring the hard ones? You'd need third-party evaluation, transparent reporting, maybe even blockchain-based audit trails — but that's a whole other rabbit hole.
My personal bet: we'll see at least one major API provider launch a formal task pricing option in 2025. Not experimental — a default choice. Could be Anthropic, could be OpenAI. Google will probably lag — they're still aggressively pushing Gemini token volume and have less incentive to rock the boat.
When that day comes, the cost structure of AI application development fundamentally changes. Variable costs become semi-fixed. Unpredictable becomes budgetable.
It reminds me of cloud computing circa 2010. Everyone was arguing about whether hourly EC2 billing made financial sense. Nobody imagined something like Lambda — pay-per-invocation — was coming. History doesn't repeat, but it does rhyme.
TL;DR
- Token pricing creates perverse incentives: providers earn more when you waste tokens
- ~37% of AI API calls are "waste inference" — burning money for zero value
- Task pricing charges per completed task (e.g., "extract entities") not per token
- Three shifts underway: Anthropic's tool-use pricing, CodeRabbit's per-PR pricing, AI recruitment's per-hire pricing
- Future likely has three layers: resource (tokens), task (fixed-price actions), outcome (business metric-based)
- For developers: prompt engineering shifts from cost-cutting to quality improvement
- Major API provider will likely launch task pricing as a default option in 2025
What's your experience? How much is your team burning on AI APIs each month? If task pricing arrived tomorrow with a 20% premium over your current token costs but gave you predictable billing, would you switch? Drop a comment — I genuinely read every single one. I'm so tired of opening monthly invoices like they're mystery boxes.
AIpricing #TokenEconomics #APIdesign #TechTrends #CostOptimisation #DeveloperExperience
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.