图解 Claude Code 剖析源码 (English)
图解 Claude Code 剖析源码 (English)
Generated: 2026-06-21 01:39:44
---
The news about Claude Code’s source code leak has been spreading pretty widely recently. There’s this 59.8MB npm package, and the source maps weren’t fully cleaned out, so over 500,000 lines of TypeScript code were just lying there. I went through it as soon as I could, curious to see how the tool was actually designed.
Before reading it, I thought it was just a fancier API wrapper. After going through it, though—its design depth is much greater than I expected. Especially the details around context management, sub‑agent construction, dynamic system prompt assembly—every single one of these ideas can be directly applied to your own agent projects.
Below I’ll break down a few core questions and explain my understanding based on the source code, along with some pitfalls I’ve run into in practice.
---
How does that “think while doing” loop actually work?
Claude Code’s foundation is simply a while(true) loop—nothing special.
The source is in src/query.ts (leaked version v2.1.88, about 1729 lines). The core function query() is an async generator, and each iteration does three things:
- Call the large model, send the current message list, and get a completion.
- Parse the
tool_useblocks from the response. - Execute the corresponding tool and push the result back into the message list.
Then the next round.
It sounds simple, but the loop is packed with practical details.
Token budget
Large models have output limits. Claude Code uses createBudgetTracker() to track how many tokens have been used per round. When the limit is about to be exceeded and the model hasn’t given a complete response, it automatically extends the limit by +500k tokens each time. It also binds a maximum of 3 retries (MAXOUTPUTTOKENSRECOVERYLIMIT = 3). If it exceeds 3 tries, it throws an error outright to prevent the model from dragging out a single tool call indefinitely. I ran into a similar issue in my own agent project—adding a budget tracker and a retry cap at least keeps the system from hanging.
Ordering of thinking blocks
The thinking and redacted_thinking blocks returned by the model must preserve their original order in the trajectory; they cannot be rearranged. The code has specific validation for this. The reason is probably that once the order is scrambled, the model’s subsequent reasoning quality drops significantly.
Error recovery
The source has several layers of recovery mechanisms built in:
- HTTP 5xx network retry.
- Tool execution failure retry (script errors trigger a retry).
- Automatic compression after context explosion (more on that later).
At the outermost level, there’s also the QueryEngine class wrapper (in src/QueryEngine.ts), which wraps that async generator into a more usable interface and adds event hooks.
---
How does Plan Mode work, “plan before executing”?
Plan Mode isn’t some special framework mode—it’s just two tools: EnterPlanMode and ExitPlanMode, implemented inside the same Tool‑Use Loop. When the model calls EnterPlanMode, it’s as natural as calling Read to read a file. The engine layer doesn’t need any special handling.
The flow is roughly three steps
- The model decides on its own, or the user triggers it manually. For complex tasks (like “rewrite the whole module”), the model calls
EnterPlanMode; for simple tasks (fix a typo), it skips straight ahead. Users can also toggle it manually via Shift+Tab.
- Once in Plan Mode, the model’s permissions are immediately reduced to read‑only. It can only use tools like Read, Grep, Glob to explore the codebase—no writing files, modifying code, or running commands. After exploring, it writes the plan into the
.claude/plans/directory.
There’s an interesting detail: every 5 rounds of conversation, the system slips the model a little note, reminding it “you’re still in Plan Mode, don’t get itchy and modify code.” The reason is that after many rounds of dialogue, the model tends to forget what mode it’s in and suddenly call a file‑writing tool. If you’re building agents, this “mode reminder” idea is worth borrowing.
- The model calls
ExitPlanMode, the user confirms, permissions are restored, and the model executes the plan.
What’s good about this design?
“Tools as capabilities”—for the model, Plan Mode isn’t a special mode switch; it’s just calling a tool named EnterPlanMode. The system doesn’t need to write separate scheduling logic for it; all the tool‑calling flow is completely reused. When I first built my own “plan + execute” two‑phase agent, I added a state machine at the engine layer (normal mode, planning mode, execution mode), and the state transitions got incredibly complicated. After seeing Claude Code’s approach, I realized: just make “enter planning mode” and “exit planning mode” into tools, let the model manage the state itself, and the engine layer stays oblivious.
Pitfall I ran into
The model writes plans in Plan Mode, but their quality was poor. The reason was that the System Prompt didn’t define Plan Mode’s behavior in enough detail. Claude Code’s System Prompt has a dedicated section for Plan Mode rules, including that it must write a plan file, must explore enough information, must get user approval, etc. Without these constraints, the model would just coast through Plan Mode.
---
How is the System Prompt written? What’s in those 8700 tokens?
It’s not static—it’s dynamically assembled from a dozen or so sections, and there’s even caching optimization.
Dynamic assembly
The source code has a bunch of prompt fragments, each corresponding to one section, roughly including:
- Identity definition.
- Behavioral rules.
- List of available tools (names, parameters, descriptions, usage rules).
- Security constraints.
- Plan Mode rules.
- Sub‑agent rules.
- File system rules.
- Shell command rules.
All the sections combined come to about 8700 tokens in the leaked version.
These sections aren’t fully assembled every time. Claude Code first checks which parts haven’t changed since the last request and directly reuses the cache. Things like tool definitions barely ever change; re‑assembling them from scratch each time would waste tokens and time.
Cache invalidation strategy
There’s an ant-only debug scenario that allows injecting custom prompt overrides via environment variables to change the default behavior. In normal scenarios, the System Prompt’s cache is controlled by content hashing. Each section generates a hash; if the hash hasn’t changed, the cached bytes are used directly; only when the hash changes is it re‑encoded. The cache granularity is very fine—each section is cached independently. So even if one section changes (e.g., the user modifies CLAUDE.md), the other parts don’t need to be re‑encoded.
Two points that often come up in interviews
- “How does the System Prompt ensure the model doesn’t overstep its authority?”
Answer: Claude Code’s System Prompt constrains each tool’s calling conditions in detail. For instance, DeleteFile is only available in specific directories; Bash cannot execute commands outside the whitelist. And these constraints aren’t just written in the prompt—there are corresponding security checks at the tool execution layer (more on the 23 layers of security checks later).
- “If the user modifies CLAUDE.md, does the System Prompt get rebuilt?”
Answer: Yes. CLAUDE.md is an injected section of the System Prompt. When the file changes, its hash changes, and that section gets re‑encoded. But the other sections are unaffected, and the cache remains valid.
My practical feedback
I borrowed the idea of “section‑based caching,” but I stepped on a landmine: the cache key calculation missed the environment variables. For example, one section’s behavior depended on the ALLOW_DELETE environment variable, but I didn’t include it in the hash calculation. So when the user changed the environment variable, the System Prompt didn’t update, and the model kept using the old rules—a security risk
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.