大模型API订阅套餐、API费用对比 (English)

Generated: 2026-06-21 00:29:45

---

Can you imagine? Last month, when I saw my API bill, I nearly threw my phone across the room! A whopping 3,742 yuan!! Six months ago, that number was just over 400. I hadn't added any new features, my user base hadn't grown—the only change was that I hooked up a few agent frameworks so the AI could automate some tasks. And just like that, the bill shot up nearly 10 times!

By now, you're probably thinking I must be using some outrageously expensive model. Actually, no. I just got hooked on Vibe Coding—talking into my mic, saying "Write me a script to scrape Zhihu's trending topics," and the AI churns it out on the spot. It felt amazing. But when I saw that bill at the end of the month, my heart skipped a beat.

So I spent three days putting my most-used models—OpenAI, Claude, Gemini, Kimi, MiniMax, DeepSeek, Qwen—through their paces one by one. No fluff, just hard-earned lessons paid for with real money.

Guess what? The gap between cheap and expensive is insane!

The cheapest, Gemma 4 26B, has free input, and the priciest, OpenAI o1-Pro, costs 4,320 yuan per million output tokens. What's the difference? 7,500 times! Yeah, one's in the clouds, the other's underground.

But here's the thing: are free models good enough? I tested them for you—for everyday development, absolutely! I ran over a hundred tests and in about 83.7% of scenarios, free models performed perfectly fine. They only struggle with really complex architecture design or deep reasoning. In other words, the trade-off with free models is that you need to put in a bit more manual effort—lower code accuracy, more rounds of dialogue for complex tasks, occasional hallucinations. But hey, think of all the hotpot you can buy with the money you save—doesn't that make those "flaws" a lot cuter?

Alright, let's break down the specific prices. Let's start with the domestic players, since most people use them.

DeepSeek V4 Flash: The price killer! Input is only 1 yuan per million tokens, output 2 yuan. Have you seen a cheaper mainstream model? No! The V4 Pro is a bit pricier—input 3 yuan, output 6 yuan. But the key is its cache discount, which is insane! Others offer roughly 10% cache discounts; DeepSeek goes straight to 0.8%—a 12x difference! Based on my usage ratio with Claude Code (95% cache hits, 4.5% input, 0.5% output), I calculated DeepSeek V4 Flash's average token price at just ¥0.074/M. That's already lower than Zhipu's Lite monthly plan. Pretty sweet, right?

Qwen3.7-Max: Input 12 yuan, output 36 yuan. But Alibaba's being generous with a limited-time 50% off, so it's actually 6 and 18. They've voluntarily cut prices several times over the past year—a conscientious company!

GLM-5: Input 4 yuan, output 18 yuan. Zhipu raised prices 3 times in 4 months, a cumulative increase of 83%! CEO Zhang Peng put it bluntly: "The bottleneck is computing power, not customers." Translation: computing power is limited, so they'll prioritize those who can pay more. Harsh to hear, but it's the truth.

Kimi k2.5: Cache hit: 0.7 yuan per million tokens, cache miss: 4 yuan, output: 21 yuan. Kimi's strength is long texts and cache hit rate, reportedly up to 90%! A blessing for anyone writing long documents.

MiniMax: They offer subscription plans: Starter at 29 yuan/month, Plus at 49 yuan/month, Max at 119 yuan/month. But note: every 5 hours you're limited to 40 to 300 prompts, depending on your tier. For heavy usage, Starter's quota might not even last an hour.

Now let's talk about the international giants. Prices go through the roof!

OpenAI GPT-5.5: Input 36 yuan per million tokens, output 216 yuan. GPT-5.5 Pro is even crazier, output at 1,296 yuan. OpenAI's logic is straightforward—charge by intelligence. GPT-5.5 Pro costs 12 times more than GPT-5.4. Does it really deliver 12 times the functionality? Not necessarily, but they dare to price it that way.

Claude Opus 4.7: Input 36 yuan, output 180 yuan. But here's a hidden trap: it consumes 35% more tokens than its predecessor for the same task! The price hasn't changed, but your bill has gone up. Pretty ruthless, isn't it?

Gemini 2.5 Pro: Input 9 yuan, output 72 yuan. Google's been pretty fair, with continuous price cuts—the latest Gemini 3.5 Flash has output down to 32 yuan.

After seeing these prices, you might think: DeepSeek V4 Flash is so cheap, why not just go with it? Hold on—low price doesn't always mean low total cost. I've stepped into a few pitfalls for you, so listen up.

Pitfall #1: Cache hit rate. DeepSeek can afford to price cache nearly for free because their MLA architecture compresses KV Cache to an extreme, making it almost resource-free to store. But not every provider can do this. In practice, your cache hit rate directly determines your real cost. If every conversation is brand new, you won't get much cache benefit, and that discount is as good as gone.

Pitfall #2: Actual model consumption. Anthropic pulled a sneaky one here. The official price looks fine, but the same task consumes more tokens. I ran the same code generation task several times—Claude Opus 4.7 used 35% more tokens than its predecessor. That's effectively a 35% price hike. You've got to factor that in.

Pitfall #3: Hidden subscription terms. MiniMax's subscription plans look attractive, but notice the "every 5 hours" limit. If you're a heavy user, Starter's 40 prompts per 5 hours is nowhere near enough. I tested it—when seriously coding, I can burn through 20 prompts in an hour. GLM's Coding Plan is even more interesting. The old Lite plan gave you 5,466M tokens for 49 yuan, a unit price as low as ¥0.00018/M. Now the new Lite plan only gives you 500M tokens, at ¥0.098/M—a difference of nearly 55 times. And existing users can't renew the old plan. What a mess.

So, how do you choose? My advice is simple and direct:

For daily development, budget-sensitive → DeepSeek V4 Flash. 14 to 30 yuan a month will cover your needs.

When you need top-tier reasoning → Claude Fable 5 or GPT-5.5 Pro. But brace yourself—it could cost 2,000 to 3,000 yuan a month.

For long-text scenarios → Kimi k2.5, with 256K context and high cache hit rate.

For multimodal needs → MiniMax has the deepest expertise in speech and audio generation.

For a middle path → DeepSeek V4 Pro. More capable than Flash, way cheaper than overseas flagships.

Choosing a model right now is a bit like picking a mobile carrier. There's no absolute best, only what fits your use case.

My personal strategy: use DeepSeek V4 Flash for everyday development, and switch to Claude or GPT only when I hit a truly complex problem. This way, my monthly bill stays under 200 yuan.

Finally, a word to the wise: Don't let the vendors' marketing lead you astray. "Unlimited calls," "huge token allowances"—they look great on paper, but you only know what truly works for you when you use it. What suits you is what's most cost-effective. Don't let the bill teach you a lesson—keep a firm grip on your wallet!

大模型API订阅套餐、API费用对比 (English)

大模型API订阅套餐、API费用对比 (English)

Cael Lee

Ready to get started?