Home / Blog / 为什么至今仍未有任何主流游戏为NPC接入大语言模型LLM? (English)

为什么至今仍未有任何主流游戏为NPC接入大语言模型LLM? (English)

By CaelLee | | 4 min read

为什么至今仍未有任何主流游戏为NPC接入大语言模型LLM? (English)

Generated: 2026-06-21 23:43:11

---

The Truth No One Dares to Speak: Why 3A Games Are Afraid to Give NPCs an AI Brain

Last Friday night, I stared at my Steam backend refund data until my eyes nearly bled.

47%.

I’d spent 60 days building an AI detective game entirely solo—five NPCs all powered in real time by a large language model. Players could say whatever they wanted. No preset dialogue options. Completely open input. Sounds impressive, right? I thought so too, at first.

Then came the 47% refund rate.

Know what that means? Nearly half the people who bought it quit within two hours and walked away without even saying goodbye.

So when I see people asking, “Why hasn’t any mainstream game integrated large language models yet?” I actually laugh. Not a bitter laugh. It’s the laugh of someone who’s been burned and knows the fear that comes after.

You think they don’t want to? They don’t dare.

If a 3A studio actually plugged this thing in, the finance department would be at the door with a knife before the players even got a chance to complain.

Come on, sit down. Let me tell you, one by one, about all the traps I fell into. And along the way, I’ll answer a few questions you’ve definitely wondered about.

---

The technical barrier? You think it’s just calling an API?

Sure, you can have a demo in half a day. And it’ll look damn impressive.

You say something, the NPC responds. Personality, tone, emotion—it all feels real. When I got my first version running, I almost felt like I was about to change gaming history.

Then the nightmare started on day two.

First trap: NPCs blurt things out.

I wrote “NEVER admit you’re the killer” eight hundred times in the system prompt. The NPC says, “I don’t know anything,” but then the stage direction reads—her eyes flicker with guilt.

Think about it. That’s basically selling the answer to the player through body language. There’s no way to defend against it.

Second trap: NPCs invent things that don’t exist in the game.

A player asks, “Did you see a knife?” There’s no knife in the game at all. But the LLM thinks, “Eh, answering won’t hurt,” and casually fabricates one. The player then takes that nonexistent knife and confronts another NPC. The entire deduction chain collapses. You think the player’s happy?

Third trap—and this is the scariest one: players can talk their way past every safety guardrail you set.

I met one genius who used 21 rounds of carefully escalating dialogue to get my 1936 socialite character to say something she absolutely should not have said. Every single round looked like normal interrogation, but cumulatively, the direction was completely off. I was using a major LLM at the time—the kind that’s locked down tight on the web version. But inside the game, with complex system prompts and player manipulation, it just bypassed the safety filters entirely.

That night, I was modifying code in a cold sweat, cursing with every keystroke.

How did I eventually solve it? Dual AI architecture.

One AI plays the character. Another AI acts as a referee—evaluating the player’s questions and reasoning direction. If the player asks nonsense, the referee gives a low score, and the character stays rock solid. If the player uses real evidence and points out contradictions, the referee gives a high score, and the character’s defenses loosen according to the designed pace.

And that’s not enough. On top of that, there’s a program-level check: every single NPC response is scanned for forbidden keywords and character deviations. If something’s off, it gets intercepted before the player even sees it.

Finally, I added a layer of cognitive defense—not keyword filtering, but making it so the character’s very identity makes certain requests impossible. A proper lady from 1936 would never do certain things. She doesn’t need the system to tell her.

So there you go. For just five NPCs, I needed two layers of AI, one layer of hard-coded logic, and one layer of cognitive design.

A 3A game has hundreds of NPCs. Do that for every single one? Your project timeline hits a decade. Which big studio can stomach that?

---

Cost? Let me tell you, it’s absurd.

At this point, you’re probably thinking: aren’t LLMs getting cheaper? ChatGPT doesn’t cost much.

That’s because you’re thinking of a few conversations.

Let’s do the math.

Assume an open-world RPG with 100,000 daily active users [Note 1]. Each player has about 20 conversations with NPCs per day [Note 2], and each conversation uses 500 tokens. Let’s calculate: 100,000 × 20 × 500 = 10 billion tokens per day.

At GPT-4o mini [Note 3] pricing, the daily API cost alone is tens of thousands of dollars. If you use GPT-4o, multiply that by ten.

And that’s a conservative estimate. In reality, the players who love to “train” NPCs can burn through your monthly quota all by themselves.

Big studios could deploy their own model, of course. But the inference cost is still high. One A100 running a 7B model can only serve a few dozen players’ conversations simultaneously [Note 4]. For a game with millions of DAUs, how many cards do you need? Do the math.

My little game was okay. Low daily active users, a few thousand a month in API fees. But if Ubisoft or Rockstar integrated this system, a game selling millions of copies would see server inference costs eat the profits.

**And here’s the killer: the business

C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free