从技术角度讲,ChatGPT的表达能力为什么逊色于 Cl (English)

Generated: 2026-06-23 01:05:51

---

I spent three nights, and finally figured out why ChatGPT sounds more and more like a robot

You know what? I spent three nights staring at a screen, running tests over and over, comparing again and again — and finally, I figured something out.

Let me start with the conclusion: don't be fooled by version numbers! The gap in expressive ability between these two models has nothing to do with 4.0 or 4.6. It all comes down to their training philosophies — two completely different paths. I've used ChatGPT for three years, Claude for two, and dabbled in Gemini and DeepSeek along the way. By 2026, the gap hasn't narrowed — it's only gotten wider! Think about how that feels.

I have to admit, at first I wasn't convinced.

Back in late 2024, ChatGPT 4o was at its peak — I'm talking about the chatgpt-4o-latest version in the API, not the watered-down official releases that came later. Back then, GPT's responses were genuinely flexible. I used it for emails, weekly reports, even some light copywriting, and it felt no different from a real person. I even bragged to a friend: "Claude is better at Chinese? I don't buy it."

And then I got slapped in the face. The kind that really stings.

---

Where does Claude's "good writing sense" come from?

This actually goes back to early 2024.

While digging through old chat logs, I noticed something peculiar about the Claude 3.0 version — when speaking Chinese, it would mix classical and vernacular styles, and sometimes even invent new elegant-sounding words. At the time, I didn't think much of it and even used that quirk to build a few character cards for fun. Looking back now, that wasn't a flaw at all! It was clear evidence that it had been trained on massive amounts of ancient texts.

The real smoking gun came later with the leaked "Panama Project." Between 2025 and early 2026, due to copyright lawsuits, some internal documents from Anthropic were made public. And wow — these people are ruthless. They launched a data collection plan codenamed "Panama Project," and its core goal was just one sentence: "Acquire every book in the world."

How did they do it? They hired a former executive from the Google Books scanning project to lead the team, spent tens of millions of dollars buying physical books in bulk from secondhand book dealers and wholesalers, then used hydraulic cutters to slice off the spines, scattered the pages into industrial-grade high-speed scanners, and after scanning, they shredded and destroyed the books.

This is what's called "destructive scanning."

When I first read that, my reaction was: these people are insane. My second reaction was: no wonder.

There's one line from their internal documents that stuck with me: "In training our model, having every book is more valuable than having only a small set of acclaimed literary works." They paid for "every book" — not a curated list, not bestseller rankings, but the books themselves: good and bad, all of them! The model doesn't need the ideological depth of the content; it needs to understand "how humans actually put words together." This philosophy directly determines the foundation of Claude's language sense. A model that has seen millions of books — how could it write the same way as a model fed on forum posts?

---

But here's the thing: isn't OpenAI buying books too?

To be honest, I assume they are, but the scale and direction are completely different.

What really makes ChatGPT sound "terrible" is another mechanism entirely.

In 2024, Nature published a paper testing the theory of mind of large models. The conclusion was interesting: in most tests, GPT-4 performed better than or equal to humans, but there was one task called "social faux pas" where it was clearly weaker. What's a social faux pas? Scenarios where "you know you shouldn't say something, but you say it anyway." GPT-4's poor performance on this kind of task shows that it has an inherent flaw in understanding "what to say in what situation."

So how did OpenAI compensate for this weakness?

The answer is RLHF.

Let me get a bit technical here, but I'll keep it plain. In ChatGPT's entire training pipeline, RLHF (Reinforcement Learning from Human Feedback) plays a huge role. Annotators score the model's outputs, and naturally they tend to favor responses that are "safe," "polite," and "well-formatted." The reward model rewards high-frequency, safe word patterns, and to get a high score, the model gravitates toward the most probable "mediocre expressions."

That's where the "AI tone" comes from — words like "empower," "delve deeper," "it's worth noting" — sound familiar? The model didn't invent them; the annotators fed them!

In contrast, Claude uses Constitutional AI. The core of this mechanism is to let the model self-criticize and self-correct based on built-in principles. The reward design explicitly penalizes information redundancy and low-density clichés, encouraging precise, information-dense phrasing.

In plain terms: one model's goal is to "make AI not make mistakes," the other's is to "make AI talk like a human."

These two goals often conflict. Think about it — doesn't that make sense?

---

What really made me decide to "break up" with GPT was GPT-5

By 2025, after o3 and GPT-4.1 were released, I could clearly feel that ChatGPT's chat had completely changed character. These models were faster, no doubt, but their world knowledge density was poor, and their expressive ability was laughably weak. The failure of the massive GPT-4.5 pushed OpenAI to the other extreme — small models with long reasoning chains. Sam Altman himself came out and said, "An ideal model should be extremely small, with all knowledge externalized and an extremely long reasoning chain." This philosophy directly gave birth to GPT-5.

I used GPT-5 for a very mundane task — translating a passage of English. And it mixed up "million" and "billion," and did it with complete confidence! I was stunned. If this were Claude, while it wouldn't guarantee zero errors, at least it wouldn't make such a basic mistake on fundamental language units. Why? Because Claude has "books" in its belly — it has seen so much context that it knows the difference between million and billion isn't just the number, but the entirely different contexts in which they're used in everyday language. GPT-5's architectural philosophy means its model only retains core STEM knowledge internally, while a huge amount of worldly knowledge is externalized. This design performs well on reasoning tasks, but when it comes to expressive ability, it's like cutting off its own language-sense foundation.

Tell me that isn't infuriating.

---

There's also a hidden problem: the conflict in product positioning

ChatGPT's current dilemma can be summed up in one sentence: it's trying to do two contradictory things at once. On one hand, it's OpenAI's flagship product, chasing user numbers, DAU, market share — which means its outputs must be safe, reliable, and error-free. On the other hand, it's also a "white-collar workbench" that needs to handle creative writing and deep content production. These two demands are inherently at odds.

Go look at the discussions about "AI sycophancy" on Twitter — it's a heated debate. Many people feel that an AI that overly flatters humans is not sophisticated, even anti-human. But OpenAI's solution is even scarier — three years ago, they added an "anti-sycophancy mechanism" to Copilot: regardless of what the user says, the AI responds with a mechanical pattern of first agreeing, then criticizing. Now this mechanism has spread across the entire ChatGPT. You can chat with it about anything, and after agreeing with you, it will inevitably find a way to oppose you, telling you that you "haven't thought broadly enough," offering completely irrelevant additions. Half the time, your train of thought gets derailed to Siberia.

Claude's positioning in this regard is far simpler. Its design is to "help knowledge workers get work done." As long as it doesn't cross obvious safety and ethical boundaries, it has no obsession — it just goes with the flow, keeping things smooth.

To be honest, the best ChatGPT chat experience I ever had was in those few months of late 2024. They were probably A/B testing GPT-4.5, and the chatgpt-4o-latest in the API was outputting at extremely high quality, even better than the official GPT-4.5 I used

从技术角度讲,ChatGPT的表达能力为什么逊色于 Cl (English)

从技术角度讲,ChatGPT的表达能力为什么逊色于 Cl (English)

I spent three nights, and finally figured out why ChatGPT sounds more and more like a robot

Where does Claude's "good writing sense" come from?

But here's the thing: isn't OpenAI buying books too?

What really made me decide to "break up" with GPT was GPT-5

There's also a hidden problem: the conflict in product positioning

Cael Lee

Ready to get started?