Home / Blog / I Almost Broke Production Last Week—Stop Making On...

I Almost Broke Production Last Week—Stop Making One Model Do All the Heavy Lifting

By CaelLee | | 6 min read

I Almost Broke Production Last Week—Stop Making One Model Do All the Heavy Lifting

Let me start with the punchline: using a single model for every task is like hammering screws. It works, technically, but you'll hate every second of it.

11 PM last Friday. I'm staring at a payment module that refuses to cooperate.

Claude 3.7 had already generated 300-odd lines of code. Every time it swore "this is the fix," the bloody thing still threw errors. Four rounds of debugging. Four. My eyes were practically bleeding when I remembered what Cursor's lead designer Ryo Lu said—models are as unique as people.

Fine. New approach.

I pasted the exact same error message into Gemini 2.5. The thing sat there thinking for about 15 seconds.

Then it gave me an answer that made me slap my desk.

The bug wasn't in the payment logic at all. It was upstream, in user permission validation. Some async operation was racing ahead without waiting for the result. Claude 3.7 had been patching the payment module for hours, completely missing the real problem. Gemini 2.5 spotted it instantly.

Brilliant.

Models Aren't Swiss Army Knives—You Need to Delegate

That night rewired how I think about model selection. Before, I'd throw everything at Claude 3.7 because, well, it's Claude 3.7. But Ryo Lu's analogy is spot-on: Gemini 2.5 is like a senior engineer who needs a nudge; Claude 3.7 is an overthinker obsessed with tools, needing to be reined in; Claude 3.5 is the all-rounder who's still rock-solid.

Translation: you need to match the model to the job.

Here's my current workflow. I use Claude Sonnet 3.5 for writing code. Yes, 3.5, not 3.7. The older model's actually more obedient at the execution level. 3.7 keeps trying to "optimise" things and ends up making a mess. When I hit a nasty bug, I throw it at GPT o1 or o3-mini-high. Their diagnostic accuracy sits around 85%, and they're roughly 3× faster than me manually trawling through logs. For scanning the entire codebase to update docs, I use Gemini Flash 2.0—blistering speed, 2,000 lines in under 30 seconds.

And Gemini 2.5? It's become my escape hatch when Claude can't crack a problem. Seriously underrated coding model. Don't sleep on it.

You're Using Context Tools Wrong

Speaking of documentation, Cursor recently published a guide on using @Docs, @Web, and MCP properly. I've been experimenting, and honestly—most people haven't a clue what separates these three.

@Docs connects directly to official documentation. Say you're on Next.js, and the docs have been updated to May 2025. But Gemini's training data cuts off at January 2025. Without @Docs, Cursor might feed you outdated suggestions.

I've been burned by this. Generated code kept throwing runtime errors, and I wasted three hours before realising the API had changed.

Three. Hours.

@Web is live search—good for community tutorials and alternative approaches. MCP accesses your internal company docs—API specs, coding standards, that sort of thing. They each serve different purposes. Don't mix them up.

Here's a trick most people miss: when you're wrestling with an unfamiliar tech stack, paste the documentation link directly into Cursor and ask it to explain errors line by line. To get the latest doc links, use Context7—context7.com—or install the Context7 MCP and add "use context7" to every conversation. Proper game changer.

Large Project? Let It Index Overnight

Cursor's official advice for big codebases: let indexing run overnight.

I actually laughed when I first read that. Then I tried it.

Codebase Indexing genuinely takes hours on large projects. My M2 MacBook Pro chewed through a 5,000-file project in about four and a half hours. Started it at 6 PM, came back next morning, and the difference was staggering—code navigation and smart completion accuracy jumped from roughly 60% to over 90%.

Also, limit your context scope to stay nimble. I once saw someone index their entire node_modules folder.

Disaster.

Cursor also emphasises something I've learned the hard way: plan before you code.

Before letting Agent loose, use Plan Mode to clarify requirements. I've witnessed too many train wrecks—no upfront breakdown, no mid-stream review, and you end up with spaghetti code that neither runs nor refactors. The right approach: split your project into small modules, have AI write them one at a time, test and review immediately. Keep each module under 200 lines. Anything beyond that tends to go pear-shaped.

Different Tools for Different Moments

Cursor's guidance on tool usage is actually practical: Tab for quick edits, Cmd+K for focused changes, Chat mode for large-scale modifications. When refactoring a big project, start in Chat mode to analyse the structure, use @folder to understand global context, then draft a refactoring plan.

This strategy—no, let's call it a system—has boosted my efficiency by at least 40%.

Setting ground rules early is crucial too. In your .cursor/rules/ directory, create a file with 5-10 clear rules telling AI your tech stack, version, and constraints. Stuff like "Always use const or let, never var" or "Use snake_case for variable names." For legacy projects, use /generate rules to let it learn on its own. Saves ages.

And be specific with your requests. Don't say "make me a button." Specify the framework, the styling, the interaction behaviour. A good prompt keeps AI on the rails. My current prompt template has at least four elements: tech stack, feature description, styling requirements, and edge cases.

What's New in 0.50

Cursor 0.50 dropped a few weeks back. The billing model's clearer now. Pro plan still gives you 500 fast requests plus unlimited slow ones. Requests without the brain icon cost 1 credit; those with it cost 2. Since Claude 3.7 defaults to thinking mode, it always eats 2 credits—bit of a pain, but there we are.

Max Mode kicks in when context overflows, and it's pay-per-token. Performance degrades beyond 100K tokens, roughly. The docs say 120K, but in my testing things got wobbly around 100K.

There's also a new Tab model. Cross-file editing feels smoother, and completion suggestions now come with syntax highlighting. Change a function name in file A, jump to file B to update calls—it handles the grunt work automatically. It's fast enough to feel like cheating.

So, Is That $20 Actually Worth It?

A lot of people ask me whether Cursor's worth the $20 a month. Here's my honest answer: if you're using one model and one mode for everything, absolutely not. But once you learn to match the right model and tool to each scenario, that $20 might be the best monthly investment you make.

You might wonder about Auto-select mode.

Truthfully, I used to think it wasn't clever enough. But recent testing's changed my mind—it works well about 80% of the time. It picks the most suitable model for the current task and auto-switches when output quality dips. Not perfect, but not rubbish either. Occasionally throws a wobbly.

So here's the real question: are you still forcing one model to do everything?

If you are, stop. Learn to delegate. Your hairline will thank you.

Key Takeaways

What's your experience mixing models? Found any surprising combos that work brilliantly (or disastrously)? Drop a comment below.

ai #cursor #programming #productivity #webdev

C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free