I Deleted 500 Lines of Business Logic by Trusting AI's "Long Context Miracle"
I Deleted 500 Lines of Business Logic by Trusting AI's "Long Context Miracle"
Last Tuesday, I dragged a 3000-line order-service.ts file into Cursor 0.42.3 and asked it to "clean things up." Midway through my coffee, I noticed something horrifying—the refund state machine, all 500 lines of it, had vanished. Not commented out. Not refactored. Just... gone. The AI decided it was redundant and "optimized" it out of existence.
I stared at the screen for a solid minute.
Here's the thing—my social feeds are drowning in AI programming hype right now. "Million-token context windows!" "Claude 3.5 Sonnet with 200K!" "Gemini 1.5 Pro hits 1M!" It's everywhere. But after a year of getting burned, I've learned something counterintuitive: bigger windows don't produce better code. They produce bigger messes. Not metaphorical messes—actual, compiles-but-logically-broken, production-outage-at-3am messes.
TL;DR for the Skimmers
- We tested three approaches to feeding code to AI: dump everything (12K lines), manually select relevant files (5K lines), and structured chunking (1.5-2K lines)
- Structured chunking hit 68% first-pass success rate vs. 31% for the "dump everything" approach
- The "dump everything" group introduced 3 logic bugs from variable name confusion across modules
- Long context windows are like a giant desk—throwing everything on it makes finding things slower, not faster
- I'll show you my actual 3-layer chunking strategy that's been working for the past 6 months
The Experiment That Made Me a Believer
Last December, my team ran an internal test that was—honestly—painful to watch. Same feature spec, same model (Claude 3.5 Sonnet 20241022), three different context strategies:
- Group A: Dumped the entire project (12,000 lines) plus the spec. Pure chaos mode.
- Group B: Manually picked "relevant" files, kept it under 5,000 lines.
- Group C: Used structured chunking—module → file → function hierarchy, feeding only what was actually needed. Roughly 1,500-2,000 lines per prompt.
The results? Group A: 31% first-pass success rate. Group B: 47%. Group C: 68%.
But the success rate wasn't even the scariest part. Group A produced three bugs where the AI confused variable names across modules. TypeScript compiled fine. ESLint was happy. But order.refundAmount and payment.refundAmount got merged into one field in the AI's "brain." One of those bugs ran in production for three days before finance caught it. Debugging that felt like defusing a bomb blindfolded.
Long context windows aren't a silver bullet. They're a giant desk where you've spread out everything you own—and now you can't find your keys.
My Most Expensive AI Mistake (Literally)
November 2024. I was refactoring an e-commerce order module and feeling cocky about Claude's 200K context window. Dumped the entire order module (8,000 lines), user module interfaces, payment module interfaces—everything. Let the AI work its magic.
The generated code looked solid. TypeScript compiled. ESLint showed zero warnings.
Two days after deployment, the finance team called. Panic mode.
A batch of refunds was wrong. Customers who bought two items with a discount coupon were getting refunded at full price—the coupon allocation wasn't being deducted. The overpayment hit around ¥32,000 (roughly $4,400 USD, if my mental math holds).
After a full day of investigation, I found the problem. The AI had smashed together two versions of our refund logic: an old 2023 version (refund at original price) and the current March 2024 version (refund at actual paid amount). Both were in the context window—the old logic was in a commented-out block at the bottom of the file, the new logic was active. The AI couldn't distinguish which was authoritative.
Wait, let me correct myself—it wasn't that it "couldn't distinguish." More precisely: it used the new logic for calculating the refund base but then jumped to the old logic's fullRefund branch when handling coupon allocation. The two code fragments sat close together in embedding space, so the model treated them as one coherent system.
Lesson: More context isn't better context. Noise corrupts judgment. And for the love of everything, delete commented-out code before feeding it to AI.
After that disaster, I changed my approach: only feed the current module's core files, with explicit interface contracts as boundaries. Haven't had a repeat since.
My 3-Layer Structured Chunking System
Over the past six months, I've landed on a system that works. It's not perfect, but it's stopped the bleeding. Here's the breakdown—critique it, steal it, whatever helps.
Layer 1: Project-Level Chunking
Stop dumping your entire file tree into AI. It sees 200 files and has an existential crisis.
I maintain a project-context.md file (200-300 lines) in the project root. It contains:
- Tech stack and architecture overview (5-10 lines, nothing fancy)
- Directory structure with one-line descriptions per folder
- Core data flow (plain text descriptions—don't use Mermaid diagrams, AI misunderstands them half the time)
- Critical constraints: "Never query the database directly—always use the Repository layer" or "All monetary calculations must use
decimal.js, floats are forbidden"
Think of this as the AI's "project onboarding doc." I include it in every conversation. My rough measurements (not rigorous, just personal tracking) show it improves context understanding accuracy by about 30%.
Layer 2: Module-Level Chunking
This is the most important layer. My rule is brutal: one module's context never exceeds 3,000 lines.
Why 3,000? It's not arbitrary. We tested different context sizes:
- Under 1,000 lines: 85%+ understanding accuracy, but often missing cross-module dependency info
- 1,000-3,000 lines: 75-85% accuracy, dependency information mostly complete
- 3,000-5,000 lines: Drops to 60-70%, hallucinations start appearing, variable names get confused
- Over 5,000 lines: Cliff dive. Code style becomes inconsistent—you'll see
async/awaitsuddenly followed by.then()chains
In practice, I maintain a module-context.md per module with core interface definitions, dependency maps, and data models. When I actually feed the AI, I include that file plus the 2-3 files I'm modifying.
Example: modifying the refund logic for the order module. I'd feed:
module-context.md(order module interface definitions and dependencies, ~400 lines)refund-service.ts(the file I'm changing, ~800 lines)payment-adapter.ts(interface definitions only—no implementation, ~150 lines)
Total: 1,350 lines. The AI breezes through it without hallucinating Payment module internals.
Layer 3: Function-Level Chunking
This idea came from a GitHub project called code-to-context—not many stars, but the approach is clever: use AST tools to parse code into function-level structured data.
For a 2,000-line service file, I use a tree-sitter script to break it into:
{
"file": "order-service.ts",
"summary": "Core order business logic",
"functions": [
{
"name": "createOrder",
"signature": "(params: CreateOrderDTO): Promise<Order>",
"summary": "Creates order with inventory check, price calculation, coupon validation",
"dependencies": ["inventoryService.checkStock", "couponService.validate"],
"lines": "45-120"
}
]
}
This is the part that needs more explanation.
The structured index is about 1/5 the size of the original file but contains everything the AI needs to navigate: function names, signatures, dependencies, summaries. When I use it, the AI first reviews the index, decides which functions it actually needs the full code for, and I provide only those.
This steers the AI's "attention" toward genuinely relevant code instead of making it needle-in-a-haystack through 2,000 lines. Honestly, I think this might be the most impactful step in the entire strategy.
A Surprising Discovery
Here's something I didn't expect.
The finer the chunks, the more willing the AI is to say "I don't know."
When I used to dump entire modules, the AI almost never asked for more information. It confidently guessed—and was confidently wrong. But with structured chunking, it'll say: "This function depends on couponService.validate—can you share that module's interface definition?"
That's actually a good thing. The AI's false confidence seems inversely proportional to the information density of its context. The noisier the input, the more the model "smooths over" uncertainty. From what I understand about transformer attention mechanisms, this makes sense—high noise pushes the model toward averaging signals, which masks ambiguity.
Tools I Actually Use
All free, all practical:
- repomix: Packages your entire project into an AI-friendly text file. But I don't recommend using the full output directly—use
--overview-onlymode to generate project overviews. Solid tool. - tree-sitter: AST parsing for generating function-level structured indexes. Supports TypeScript, Python, Go—covers most of what I need.
- A janky little script I wrote (maybe 200 lines) that auto-generates
module-context.mdfiles using tree-sitter. It runs ongit commitvia a pre-commit hook. I'll clean it up and dump it on GitHub next month—promise.
The Bottom Line
The biggest bottleneck in AI programming right now isn't model capability. It's how we organize and present information to the model.
Long context windows give us the illusion that we can skip the hard work of information curation. But here's the truth I keep relearning the hard way: information quality matters far more than quantity. I've said this a hundred times, and every production incident reinforces it.
When I onboard new team members now, the first thing I teach isn't how to use AI—it's how to prepare context for AI. The interns who nail this habit see productivity gains that dwarf any model upgrade. We had one intern last quarter who came in with zero experience but obsessive context organization skills. A month in, their output rivaled someone with two years of experience.
I'm curious—how are you all handling long contexts in real projects? Any war stories? Better chunking strategies? Drop a comment. I'm actively optimizing this workflow and would love to steal your ideas.
Edit: Wow, this resonated. A few commenters mentioned using RAG (Retrieval-Augmented Generation) for code search—vectorizing the codebase and retrieving semantically relevant snippets before feeding the AI. That's fascinating. I'm testing LanceDB for local vector indexing next week; I'll report back. Someone also recommended aider's /map command for automatic function-level context mapping—adding that to my list to explore.
AIProgramming #LongContext #SoftwareEngineering #DeveloperProductivity #CodeQuality
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.