I Let Claude Code Loose on My Legacy Codebase. It Was a Beautiful Disaster.
I Let Claude Code Loose on My Legacy Codebase. It Was a Beautiful Disaster.
Last month, I tried to get Claude Code to refactor a legacy monolith into a service-oriented architecture.
Spoiler: I spent the next 4 hours unborking git.
"Hey AI, refactor the user module into a clean architecture."
Within 20 minutes, it created a folder called ultracleanarchitecture/, deleted three perfectly good repository files because they "looked redundant," and swapped out our entire database migration library for some hot new thing “because it was cleaner.”
Nope. We're rolling back.
Here's the thing—the problem isn't the AI. It's the job description we're giving it.
The "Single Prompt" is a Straight-Up Lie
Everyone on Twitter shows Claude writing an entire Next.js SaaS from a single "make me an app" prompt.
For a greenfield to-do list? Sure. Works great.
For a production codebase with 15 years of edge cases, weird dependencies, and spaghetti you specifically rely on to close the books at quarter end? Get out of here.
I tested this on my team’s 50k LOC backend.
The Monolithic Prompt (The Wrong Way):
"Refactor the billing module into a strategy pattern."
Result: 3 hallucinated files, 2 deleted utility functions, 1 broken import chain. Immediate rollback.
The Decomposed Prompt (The "Yeah, I Have Time for This" Way):
Prompt 1: You are a planner. Analyze `src/billing/`. Output a plan with file names and responsibilities. Do not write code.
Prompt 2: Implement the interface `BillingStrategy` in `src/billing/strategies/interface.ts`. Only write the interface.
Prompt 3: Implement the concrete `CreditCardStrategy` class. Do not touch anything else.
Result: 0 regressions, 100% clean merge. Took me slightly longer in prompt time, saved me hours of debugging.
Data point #1: Breaking a big refactor into 8 small prompts (under 1500 tokens each) reduced the "spontaneous rewrite" bug by about 70% compared to 1 massive prompt (8k+ tokens). The LLM wants to show off. Smaller scopes keep it humble.
The Manifest File Trick (Stop the Scope Creep)
The worst thing Claude Code does when editing multiple files is scope creep. It sees a function in services/user.ts and decides it should move it to helpers/user.ts because “it makes more sense architecturally.” Mid-task. Without asking.
This approach—wait, let me actually call it a strategy—involves a strict ARCHITECTURE.md / CLAUDE.md file that reads like a contractual agreement. Seriously.
## CLAUDE.md (excerpt)
- src/controllers/ handles HTTP request/response. NO business logic.
- src/services/ contains business logic. CAN call repositories.
- src/repositories/ contains database queries. CANNOT call controllers.
- migrations/ and config/ are EXEMPT from editing unless explicitly permitted.
I prepend every session with:
Read CLAUDE.md and ARCHITECTURE.md. If a change requires modifying a file
that violates these rules, explain the conflict and STOP. Do not auto-edit.
Data point #2: Using a hard-coded manifest file cut cross-file hallucination rates (AI inventing relationships between files that didn't exist) by roughly 60%. It stopped trying to be an architect and started being a good implementer.
The "Poison Pill" Guardrails
Funny enough, this one comes straight from trauma. I had a session where Claude decided to "optimize" our package.json scripts by removing a "redundant" deployment hook. We pushed to staging.
Staging exploded.
Now every non-trivial session starts with a strict "Don’t Touch" list.
## CRITICAL BOUNDARIES:
- Never modify db/migrations/, package.json, Makefile, or .github/.
- Never delete code comments that contain "DO NOT REMOVE" or "LEGACY".
- If you must change a file in a boundary, you may ONLY output the file
path and reasoning to the console. Do not write it to disk.
Data point #3: Adding explicit negative constraints (do not touch X, Y, Z) cut critical production bugs from AI-generated code by about 80% in our sprint. The AI loves to clean things. Sometimes "cleaning" is just nuking the technical debt you specifically rely on to pay your mortgage.
The Cadence Trick
I changed my workflow from "let it cook for 10 minutes" to "one file, one confirmation."
- Prompt -> AI writes file -> I review diff -> Approve -> Next prompt.
It sounds slow. It isn't. Because the AI isn't sitting there generating garbage for 10 minutes and hiding it in 6 files. It takes 30 seconds per iteration. You can do 10 iterations in 5 minutes.
The Mental Model: You're the Architect. It's the Intern.
I still think these tools are incredible. But the hype is doing everyone a disservice. The secret isn't prompt engineering. It's task engineering.
You are the architect. Let the AI be the really fast, slightly chaotic intern that needs explicit permission to touch the good china.
TL;DR (for the skimmers)
- Stop asking Claude Code to "refactor the whole module." It can't. Decompose the task into 5–10 explicit files and responsibilities.
- Write a strict
CLAUDE.mdorARCHITECTURE.mdthat defines file boundary rules. Enforce it in your prompt. - Use a "don't touch" list for sensitive directories (migrations, configs, CI/CD).
- Keep prompt tokens under 2k. Force single-file or highly specific multi-file edits.
- Confirm every step. The "one file, one confirmation" cadence is your safety net.
What's your experience? Anyone else get burned by a "spontaneous improvement" from an AI? Or found better strategies for keeping these things on the rails in a big codebase?
Drop a comment below. Or better yet—how many of you have a CLAUDE.md file sitting in your repo root right now?
claudecode #aiassistant #refactoring #softwareengineering #devtools
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.