所有人都以为AI越用越笨,其实是你没给它“记忆系统” (English)
所有人都以为AI越用越笨,其实是你没给它“记忆系统” (English)
Generated: 2026-06-22 07:52:07
---
I Was So Furious with Claude Code I Almost Smashed My Keyboard—Then I Realized AI Doesn’t Need to Be “Smarter”
Three months ago, I nearly smashed my keyboard over Claude Code.
At the end of 2025, I asked it to help me refactor a Python backend service. The first conversation blew my mind—it understood my architectural intent, proposed a sensible module breakdown, and even proactively flagged potential performance issues in the code.
I thought: We’re golden! This is going to take off!
Then the next day, it had completely forgotten the design decisions from the day before. The naming conventions changed, it pulled in a third-party library I had explicitly banned, and within the same function it mixed synchronous and asynchronous code—like a college freshman who just learned to code and writes “as long as it runs” garbage.
I checked the conversation context—only 20k tokens. It wasn’t the model’s fault. It was mine. I hadn’t given it any memory system.
Over the next three months, I tested, stumbled through countless pitfalls, and finally discovered a truth that left me both frustrated and exhilarated:
Harness Engineering isn’t something new—it’s something we should have done long ago but never got right.
---
You Think AI Got Dumber? Wrong! You Asked the Wrong Question
First, accept a brutal reality:
LLMs are goldfish by nature—seven seconds of memory is generous.
Expecting it to remember everything in a conversation is like expecting an intern to keep the entire project’s architecture and standards in their head. Possible? No.
Yet people still complain: “Why is this AI getting dumber?” “Yesterday it wrote perfectly, today it’s a mess.”
Where’s the problem?
Harness Engineering isn’t about “making AI smarter.” It’s about “making AI controllable and consistently productive.”
Smartness is the model company’s job. Controllability is ours. Models can iterate, upgrade, and get stronger, but if you don’t have a set of reins to make them follow your rules, no matter how smart they are, they’re wild horses—the faster they run, the harder you crash.
---
Three Practices That Made Me Say “Wow”—Each More Counterintuitive Than the Last
1. The Simplest Method Works Best: A Text File Cured AI’s Amnesia
Anthropic engineers mentioned a practice: maintain a claude-progress.txt file. At first, I thought it was too low-tech—using a text file for state management in 2026? Isn’t that going backward?
But after I tried it, I was hooked.
Here’s how: Place a progress.json file in the project root. Every time the Agent starts, it reads this file first. The format looks something like this:
{
"current_task": "Refactor user authentication module",
"completed_tasks": ["Database connection pool optimization", "Log system migration"],
"decisions": [
{"id": "DEC-001", "content": "Use JWT instead of Session", "date": "2026-02-10"}
],
"blockers": ["Waiting for third-party API documentation update"]
}
I tested it for two weeks. The effect was immediate. The Agent’s context fragmentation dropped from 3–4 times a day to nearly zero. All it cost was a few dozen lines of JSON and automatically updating this file after each task.
Someone might say: “Isn’t that just manually maintaining state? Too primitive.”
But think about it: When you write code, don’t you jot down progress and to-dos on sticky notes? For the Agent, this text file is its sticky note. Sometimes the best solution is the simplest one.
2. Don’t Let AI “Promise”—Make It “Prove”: Mechanical Verification Beats Verbal Requests
The deepest pitfall I fell into was writing prompts like “Please ensure the code passes tests.”
Guess what? The AI would reply, “I have ensured the code passes tests,” and then hand me code that wouldn’t even compile.
AI doesn’t lie, but it overconfidently assumes. It will always tell you “no problem,” even if it never actually checked.
The solution? So simple it makes you want to slap yourself: Add a rule in the system prompt—“Before marking a task complete, you must run the test suite and save the test results screenshot to the test_results/ directory.”
I call this “mechanical verification over verbal requests.” Turn the “request” to the model into a system-level “enforcement.”
Concrete implementation:
- Add a
run_teststool to the Agent’s tool list - This tool first executes the tests, then parses the output
- Only when all tests pass is the Agent allowed to mark the task complete
- If any test fails, it must fix and rerun
I tested this mechanism on a React project. Previously, the Agent’s generated code required an average of three rounds of manual fixes to pass tests. After adding mechanical verification, the first-pass pass rate jumped from 20% to 78%.
The Agent didn’t get smarter—I turned “passing tests” from a suggestion into a rule.
3. Write Documentation for the Agent, Not for Humans—This Discovery Made Me Cringe
I spent two weeks writing a detailed developer documentation set for the project—architecture overview, coding standards, API docs—the works. Then I found out the Agent didn’t read it.
It wasn’t lazy; it couldn’t understand.
I wrote documentation for humans—with context, implicit assumptions, and “you know what I mean” parts. But the Agent needed machine-readable, precise, unambiguous specifications.
So I changed the format to this:
## Architecture Rules (Agent must follow)
- All data access must go through the Repository layer; direct database calls are forbidden
- Exception handling must use custom exception classes; native exceptions are forbidden
- Logging must use structured format (JSON); string concatenation is forbidden
Every rule is binary and executable. The Agent can precisely determine whether it has violated a rule.
The result? Code consistency skyrocketed. Previously, about 30% of the Agent’s generated code needed manual architectural fixes. After writing machine-readable docs, that number dropped to under 5%.
The truth is: The Agent wasn’t disobedient—you just didn’t state the rules clearly.
---
Counterargument: Isn’t This Over-Engineering?
Someone might say: “Aren’t you making a simple problem complicated? Why not just write the code directly?”
I understand that sentiment. Honestly, I thought the same way. Until I saw teams using similar methods to let Agents build production systems with over 100,000 lines of code over several months—most of it AI-generated, with humans only reviewing and adjusting.
That result alone is astonishing, but what’s even more noteworthy is the core constraint the team set: “No manual code input by humans is allowed.”
This constraint wasn’t for show. Its purpose was to force the issue: When you absolutely cannot touch the code yourself, you must explicitly encode all implicit engineering knowledge into the Harness—which architectural patterns are allowed, which checks must pass, what the codebase structure standards are. Everything that used to rely on “experience, convention, and tacit understanding” must become system-enforceable rules.
That’s the essence of Harness Engineering: transforming human engineers’ tacit knowledge into explicit constraints that the Agent follows at runtime.
---
Start Now—With These Three Things
Don’t try to do it all at once. It took me three months to build a barely usable Harness, and I’m still iterating.
But you can start today:
- Today: Add a
progress.jsonto your project root. Let the Agent record progress and decisions. - This week: Add a “verify before completion” rule in the system prompt—run tests, start the app, take screenshots.
- This month: Rewrite your architecture and coding standards into machine-readable format so the Agent can precisely check compliance.
---
AI is already a thoroughbred. A thoroughbred without reins, no matter how fast, will never reach the destination. Harness Engineering is the most important rein of this era.
Let me close with one sentence:
Building software still requires discipline, but that discipline now resides more in the supporting structures—tools, abstractions, feedback loops—than in the code itself.
Your value no longer depends on how fast you write code, but on your ability to design systems—constraints, feedback loops, and control mechanisms—that are truly irreplaceable.
Stop being just a coder. Become the one who tames the AI.
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.