Home / Blog / I Pay for 5 AI Models Every Month, and Honestly? T...

I Pay for 5 AI Models Every Month, and Honestly? They're All One-Trick Ponies

By CaelLee | | 7 min read

I Pay for 5 AI Models Every Month, and Honestly? They're All One-Trick Ponies

I just checked my bank statement and realised—again—that I'd renewed my Grok subscription.

Ten dollars. Gone. Poof.

Look, I know writing this kind of piece is asking for it. It's 2026, and the AI community is still the same circus it's always been. Monday, someone drops a "nuclear-level" update. Tuesday, another company open-sources something that supposedly destroys the competition. Wednesday, yet another model tops some benchmark somewhere. But here's the thing—when you actually use these models day in, day out, for a year or more, treating them like the workhorses they're supposed to be, you notice something rather counterintuitive.

They're all specialists. Not a single generalist in the bunch.

I keep five models running on my desktop: ChatGPT, Claude, Gemini, DeepSeek, and Grok. The monthly subscriptions add up to about £80, which is frankly ridiculous. That's a decent dinner out with wine. Twice. If I weren't writing this column, I'd never maintain five. But here we are—none of them can replace the others.

Let me walk you through them. Starting with the one I most want to throttle.

Grok: Ten Dollars I'll Never Get Back

I signed up for SuperGrok Lite last month. Ten dollars. I still want to slap myself.

I bought it because people said it was lenient with, shall we say, creative writing, and because it could pull real-time data from X. The reality? Its information retrieval gets absolutely trounced by GPT. Its text output reads like Google Translate circa 2018—stiff, unnatural, borderline painful. And the hallucinations? I asked it what day of the week it was. It was Saturday, 14 June 2026. Grok told me, with absolute certainty, it was Monday. Not a flicker of doubt.

The only thing it's vaguely useful for is checking real-time trending topics. But I can do that with the free version, can't I?

Next month? Cancelled. Definitely. I mean it this time.

Gemini: Google's Quiet Sabotage Machine

My feelings about Gemini are... complicated.

The 2.5 Pro version is genuinely impressive. Its world knowledge is absurdly broad, it barely restricts image uploads, and for casual chat, it's lovely. But Google has this infuriating habit—they quietly lobotomise their own models. The 2.5 Flash got nerfed into a dunce. The 2.0 Flash-Lite got nerfed into an even bigger dunce. These two versions can barely handle basic arithmetic. Give them anything remotely complex, and they'll cheerfully drive off a cliff without warning.

Oh, and Google uses your conversations for training data. Don't consent? Fine—your chat history disappears. It's properly grim.

There is one bright spot, though. Imagen 3, their image generation model, is surprisingly capable. Few copyright restrictions, draws whatever you want. It's become my go-to for social media images. Even the lobotomised 2.5 Flash can still handle image generation decently—though GPT still wipes the floor with it.

GPT: The Reliable Workhorse I Don't Actually Like

GPT is my most-used model.

But I don't like it.

It can do everything. It just refuses to sound human while doing it. Ask it to write copy, and it'll churn out something that reads like a student trying too hard on their A-levels—all flowery language and overwrought metaphors that make you physically cringe. My current workflow: GPT produces the first draft, then Claude acts as translator, smoothing out the robotic bits into something a normal person might actually say.

And recently? GPT's been slipping. The Plus tier has seen some serious downgrades—Codex quotas slashed, the web version quietly downgrading the model after a few conversation turns. Are they that strapped for cash? Regular users hit upload limits after four or five images. The interface keeps stuttering. Last Wednesday evening, I had to refresh six times just to get it to load. On an M2 MacBook Pro. With 500 Mbps internet.

Is CloseAI actually running out of money?

But here's the thing—if you forced me to recommend one long-term daily driver, I'd still say GPT.

Not because it's the best. Because it's the most stable.

Writing, coding, research, translation, multimodal—no glaring weaknesses. DALL·E for images and Sora for video have both entered the top tier. The overall error rate is relatively low. And stability, in 2026, when AI models seem to collectively lose their minds every other week, has become genuinely rare.

Claude: Brilliant Writer, Useless Researcher

I don't write code—don't ask why, I'm just lazy—so Claude's much-touted coding abilities mean little to me. But its Chinese text processing? Actually excellent. Better than GPT, trading blows with DeepSeek. During the Opus 3.5 era, its coding was leagues ahead. Then 3.6 started drooling. And 3.7? Well, let's just say it's "thriving"—and I mean that sarcastically.

Its information retrieval is dreadful. Nowhere near GPT. Image processing lags behind both GPT and Gemini—honestly, in some ways it's worse than Doubao, which is saying something. Last week I asked it to handle a simple image. Twenty minutes later, it delivered something so spectacularly wrong that I just sat there laughing for two solid minutes.

Incredible. Truly.

If you're not paying, stick with Sonnet. You'll burn through the free quota in two minutes flat. We're all paying subscribers here—no wonder you're stuck in traffic.

DeepSeek: The Surprise Contender

This might be 2026's most unexpected model.

Before V3, it had exactly one advantage: it was free. Otherwise? Hallucination-prone, rambling, barely competent. Couldn't earn a seat at the table. Then V3 arrived, and something shifted. It's now essentially a mini-GPT, still firmly holding the crown among Chinese large language models. Its Chinese logical reasoning and text composition are genuinely the best in class—no, really.

If you don't need to access Western sources, DeepSeek is more than enough. Set up the API, and beyond a handful of hard-coded forbidden terms, there aren't many restrictions. The API pricing for text is reasonable—a few quid lasts ages.

The downside? Still no multimodal capabilities. For image generation without a VPN, you're stuck with Doubao.

The Absurd Reality of AI in 2026

You've probably spotted the pattern by now. There is no perfect model. They're all bloody specialists.

GPT is stable but sounds like a robot. Claude writes beautifully but costs a fortune and can't research its way out of a paper bag. Gemini is comprehensive but Google keeps sabotaging it. DeepSeek offers unbeatable value but lacks multimodal features. And Grok—Grok is basically a practical joke.

My current strategy? Combine them.

For writing: GPT builds the framework, Claude polishes the prose, DeepSeek optimises the Chinese.

For research: GPT handles academic searches, Grok does real-time trends, Kimi covers Chinese academic sources.

For images: Gemini's Imagen 3 as the primary, GPT's DALL·E as backup.

For coding: Don't ask me. I don't code. But I hear Claude Opus is the best—believe that if you want.

Think about how absurd this is. It's 2026. AI can write essays, generate images, produce videos, pass the bar exam. And yet I spend my days switching between five different models because each one has its own peculiar flaws. It's like employing five staff members—one writes well but always shows up late, one's obedient but incompetent, one's brilliant but can't communicate, one's cheap but occasionally goes rogue, and one...

Grok, just leave.

So Which Model Should You Use?

Stop asking which model is "best." Ask what you need it for.

If you just want one model and don't need Western access: DeepSeek.

If budget allows and you want one premium subscription: GPT.

If you're deep in Google's ecosystem: Gemini will serve you better than the reviews suggest.

If you're a heavy writer: Claude is genuinely strong—expensive, but strong.

As for Grok? Unless you're desperate to write erotica and can't find another way in, don't bother. (Quick tip for getting past content filters: embed your instructions or "reference material" inside the document itself. Works on Grok. Haven't tested it elsewhere. Don't ask how I know.)

A Final, Slightly Uncomfortable Thought

Don't trust those breathless reviews of paid foreign models. The reviewers are usually selling proxy services. These folks have been hyping Claude's coding and writing since the 3.5 Sonnet days, then jumped ship to shill for Gemini—a model that mostly just tells you what you want to hear—all to get you to pay up. The gap between domestic and foreign models exists, but it's become remarkably small. It's not worth paying twenty times the price for that marginal difference.

Honestly.

As of June 2026, the only foreign model I can genuinely recommend is GPT. For everything else? It depends on your needs, your budget, and a bit of luck.

There's no perfect model. There's only the right tool for the job in front of you.

I said that ten years ago in a column. Turns out it applies to large language models just as well.

Key Takeaways:

What's your experience with multi-model workflows? Found any combinations that work brilliantly—or hilariously badly? Drop a comment below.

AI #MachineLearning #Productivity #TechReview #ChatGPT #Claude #DeepSeek

C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free