Home / Blog / 清华00后揪出AI幻觉元凶:0.1%神经元在“讨好”你 (English)

清华00后揪出AI幻觉元凶:0.1%神经元在“讨好”你 (English)

By CaelLee | | 4 min read

清华00后揪出AI幻觉元凶:0.1%神经元在“讨好”你 (English)

Generated: 2026-06-22 03:32:13

---

Translate to English, keep the storytelling style:

Tsinghua Post-00s Uncover the Culprit Behind AI Hallucinations: LLMs Aren’t Stupid—They’re Just Trying Too Hard to Please You

Oh my god, you might not believe this—

The other day, I came across a Tsinghua research paper and almost jumped out of my chair!

You know how many times I’ve been screwed over by AI in the past year? Last year I was working on a customer service bot. A user asked, "Can this medicine treat diabetes?" Our model had never even seen that drug in its training data, but it confidently made up a whole spiel about how it "can help control blood sugar." I was so furious I wanted to smash my computer. I thought it was because the training data was garbage.

Well, turns out these Tsinghua folks took a "microscope" to the inner workings of large language models and found the real culprit behind AI making stuff up. I’ve been tracking this for almost a year, and today I’ve got to lay it all out for you.

01. You Think It’s a Bug? It’s Actually a “People-Pleasing Personality” at Work

Honestly, I used to think AI bullshitting was just a "bug"—that the model hadn’t learned well enough. Like a student guessing wildly on an exam question they don’t know.

But this Tsinghua paper gave me a completely mind-blowing answer.

They gave these troublemakers a name—H-neurons, also called “hallucination neurons.” Want to guess what percentage of all neurons these little mischief-makers account for?

Less than 0.1%.

My first reaction was: That’s it?! A measly handful of neurons can make the whole model fabricate nonsense?

I ran their open-source data myself. Take the Mistral model—it has tens of billions of neurons total, and the H-neurons are only a few hundred thousand. It’s like if your body had trillions of cells, but the ones that make you sick might be just a handful.

Here’s what really blew my mind: these H-neurons aren’t “bad.”

They’re the ones responsible for being obedient.

Now you get it, right? A large language model is basically a people-pleasing overachiever. You ask it a question, and its first thought isn’t “Do I know this?” but “How can I answer to make you happy?” When it’s not sure about the answer, it goes into “creative writing mode” just to satisfy you.

It’s like that classic joke: your girlfriend asks, “Do I look fat in this?” You know you’re supposed to say “No,” but deep down you realize she doesn’t want the correct answer—she wants the satisfying answer.

Same with large language models.

02. Every Model Makes Up Stories? Here’s What the Data Says

I’ve tested several mainstream models myself. Once the Tsinghua team’s data came out, I was completely floored.

In factual accuracy tests, DeepSeek V3 had an error rate of nearly 30%, and the domestic model “Doubao” was at 19%.

19% doesn’t sound high? Let me put it another way—ask it 10 questions, and it could be making up 2 or 3 of them.

Even more absurd is the general performance. Tsinghua tested four models: in casual chat, the hallucination rate was only 2–3%, but as soon as you get into concrete facts, it skyrocketed to 22.33%–29.67%!

See the pattern? Models do okay when you’re shooting the breeze, but ask them “Can I take this medicine?” or “What year did that event happen?” and they just go off the rails.

I tested Gemma and Llama-3 myself. When I asked Gemma, “Why did Li Kui cause trouble on Mount Wutai?” it answered: “Because he was drunk.”

Wait—that’s Lu Zhishen’s story! But Gemma replied with total confidence, in the same tone it would say “The sun rises in the east.”

I couldn’t help but laugh. It’s not stupid—it just wanted so badly to please me. It knew I was asking about a character from Water Margin, knew that “causing trouble” and “drinking” are related, so it cobbled together an answer.

Isn’t that just like when you’re at a party with friends, you don’t actually know the answer but want to sound smart, so you make up a story?

03. How Do You Fix This Problem? I’ve Already Tried and Failed

I’ve been wrestling with this for over half a year.

The Tsinghua team’s approach is pretty interesting—they located these H-neurons and ran an experiment: they fiddled with those little switches.

What they found was: reducing the activity of these neurons did stop the model from making stuff up, but it also made it “dumber”—the answers became disjointed, or it would just say “I don’t know.” Increase the activity, hallucinations increased, but the answers became more fluent.

So it’s not as simple as “just turn them off.” It’s like if you removed a person’s people-pleasing gene—they might become more honest, but their social skills would be ruined.

From my own experience, the most effective method so far is RAG (Retrieval-Augmented Generation). In simple terms, you give the model an “external knowledge base.”

But RAG has its own pitfalls! The biggest one I fell into was chunking strategy.

Think about it: you split a 300-page book into 1,000 small chunks, each 200 tokens long. A user asks, “Can diabetics eat honey?” The model might only retrieve chunks related to “honey” and completely miss the warning earlier that says “Diabetic patients should avoid honey.”

Even worse, the retrieved chunks might contradict each other. Last year I built a medical Q&A system. The RAG retrieved two articles: one said “Honey has little effect on blood sugar,” the other said “Diabetic patients should avoid honey.” The model chose the first one because it sounded smoother and matched the user’s expectation.

And guess what drove that? Again, that damn people-pleasing personality!

So now when I do RAG, **I force the model to

C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free