claude code源码优秀解读整理 (English)

Generated: 2026-06-22 02:05:33

---

I spent three days digging through Claude Code's source code, and guess what I found… I was wrong!

You see, my attitude toward Claude Code went through three complete faceplants.

The first time was when it first blew up. I glanced at it and thought: Isn't this just a wrapper around a large language model chat tool? What's the big deal.

The second time was when I saw other people using it to write code. I admitted it was useful, but still thought: Under the hood it's just a dialog box plus a few API calls. Nothing special.

The third time—that was last week.

I scoured every source code analysis of Claude Code I could find online. Zhihu, WeChat public accounts, tech blogs, PPT presentations… I read over a dozen articles in one go. After finishing each one, I cursed myself inside:

So shallow. So shallow. So—shal—low.

At this point, I want you to take a deep breath. Because the numbers I'm about to share might make you gasp just like I did.

---

First, let's look at just how "chunky" this project really is

300,000 lines of TypeScript code.

In the AI coding tools space, that's already a "monster-level" codebase.

But wait, there's more. The entire project has over 1,800 TypeScript and TSX files combined. Total code volume—510,000 lines. And guess how big the main entry file is?

Almost 5,000 lines. 786 KB.

A CLI tool with 510,000 lines of code. Honestly, having worked in backend architecture, my first reaction was: This complexity level probably surpasses a lot of industrial software.

Everyone thought it was a "lightweight chat tool," but it turned out to be a "heavy operating system."

I was thinking: An AI coding assistant, its core logic can't be that many pieces—call the model, manage context, execute tools. How complicated could it be?

After reading every single analysis, I realized just how wrong I was.

What's truly complex isn't the "code writing" step at all.

---

Tech Stack: How dare you choose this?

Let's start with the tech stack they chose.

Runtime: Bun, not Node.js
Terminal UI: React + Ink
Validation: Zod v4
Streaming: async generators
Telemetry: OpenTelemetry + gRPC

Bun! I tried pushing Bun inside my company, and the team's response was: "Ecosystem isn't mature enough, too risky, don't cause trouble."

But Anthropic? They just went ahead and built a production project with hundreds of thousands of lines on it without a second thought.

And Ink. Anyone who's used it knows it's essentially running a React component tree inside a terminal. A CLI tool rendered with React? Honestly, the rendering overhead is much larger than you'd think. But that's exactly what Claude Code does.

Why? Because they've invested far more in terminal interaction experience than anyone else.

Then there's one design decision that made me slap the table on the spot—

Feature Flags are split into two layers.

One layer is compile-time, using feature() from bun:bundle for dead-code elimination. The other layer is runtime, using GrowthBook for feature gating.

What does that mean? Some features are cut off at compile time—they never even make it into the production bundle. The impact on bundle size and performance is virtually zero.

This isn't "radical"—this is "user experience carved into your very bones."

---

The part that made my scalp tingle: The Agent Loop

Before this, my understanding of an Agent was basically an infinite loop of "call model → parse tool call → execute → feed back in → next round."

Claude Code? Not at all that approach.

There was one analysis I read three times before I truly understood their architecture. In simple terms, it's two layers:

Layer 1: The headless conversation engine

Think of it as a "front desk clerk." It handles user input, manages conversation history, exposes capability surfaces, and does capability discovery. Once the front desk has everything ready, it passes the processed message flow to the next layer.

Layer 2: The Query Loop itself

This is the real "back kitchen." This layer doesn't maintain a single loop—it maintains a cluster of cross-iteration state machines:

Message set
Execution context
Compression state
Output recovery counter
Round budget
Task budget

…

Get it? This isn't a "model call wrapper"—this is a genuine runtime state machine!

In Claude Code's design, a single Agent turn isn't a linear API call. It's a run that gets interrupted over and over—tool execution, context compression, error recovery, budget exceeded… at every interruption, the state machine has to make a decision.

In the core skeleton code, they explicitly maintain a bunch of states:

autoCompactTracking
maxOutputTokensRecoveryCount
pendingToolUseSummary

A normal orchestrator wouldn't bother keeping these things around long-term.

Once they enter the main loop, it means the system acknowledges one thing:

The model isn't a one-shot evaluator. It's just one ordinary node in a running pipeline.

---

Context window: The hell that burned me before

At this point, I need to be completely honest with you.

Everyone who's worked on Agent projects has been burned by this thing.

In a project I did last year, because I didn't manage context well, the window would just blow up mid-conversation. Users complained, "The bot starts losing its memory as we talk." After debugging we found out: the context window was full, and all the early critical information had been squeezed out.

It was painful.

What's Claude Code's solution? One WeChat article summed it up beautifully—the "5-layer compression pyramid."

I love this metaphor.

Discard early rounds
Fold consecutive tool calls
Summarize old conversations
Structural compression
Selective forgetting

Each layer has its own trigger conditions and cost evaluation. It doesn't just blindly compress everything—it dynamically chooses strategies based on window occupancy rate, task type, conversation stage.

And the most critical part?

Auto-Compact isn't "compress when the window is full"—it's "predict when it will be full and schedule in advance."

Trigger conditions include:

Current token usage exceeds a threshold
About to enter a new tool call
Detects a large amount of low-value content in context
After a long user idle period

The compression decision logic left me awestruck:

System instructions: always kept
project instructions from CLAUDE.md: always kept
Last N conversation rounds: kept
Middle conversations: summarized and compressed
Details beyond K rounds: directly discarded (unless marked as important)

Even more ruthless, there are two types of summarization prompts:

One is "complete but brief"—recording key decisions and tool call results.

The other is "minimal but accurate"—only the background necessary for the current task.

And it even calculates a quality score after compression. If the information loss is too high, it switches to a different compression strategy.

This level of workmanship is top-tier in the industry.

---

The multi-Agent mechanism that blew my mind

I read about the Fork mechanism over and over. Honestly, this design might have the greatest production value of all.

What's the difference between a Fork and a regular subagent?

A Fork doesn't create an independent full context. It shares most of its context with the parent Agent—system prompts, tool list, current codebase state, cache state. It only needs an extra differential instruction.

How big is the cost advantage?

One analysis mentioned: in a cache-friendly scenario, a Fork Subagent can reduce costs to about 10% of the original.

10%! Sending a SubAgent used to cost you ten bucks; now it costs one. Tasks you wouldn't have dared to assign due to cost, now you can assign them all.

And that's what I mean—

Cost optimization itself is a capability.

Going up another level, there's the Coordinator mode. Now this is true multi-Agent parallel collaboration. It's not enabled by default—it requires both a compile-time flag and a runtime environment variable.

When it is enabled?

The main Agent's behavior pattern changes fundamentally—from "all-round expert" to "pure coordinator." It's responsible for analyzing tasks, planning, distributing to Worker Agents, executing in parallel, and finally collecting results.

The main Agent stops doing the actual work. It becomes a task dispatcher and result integrator.

The Fork mechanism and Coordinator mode are mutually exclusive. Because their responsibilities overlap: Fork is a lightweight

claude code源码优秀解读整理 (English)

claude code源码优秀解读整理 (English)

I spent three days digging through Claude Code's source code, and guess what I found… I was wrong!

First, let's look at just how "chunky" this project really is

Tech Stack: How dare you choose this?

The part that made my scalp tingle: The Agent Loop

Context window: The hell that burned me before

The multi-Agent mechanism that blew my mind

Cael Lee

Ready to get started?