I Let an AI Agent Loose on My Terminal — Here's What Happened After 3 Weeks

Last Tuesday at 3pm, I typed claude "refactor this 2000-line Python script" into my terminal and wandered off to make a pour-over. Before the coffee finished dripping, it was done. Refactored. Unit tested. And it had fixed a boundary condition bug that'd been lurking for three months.

I stood there holding my mug, staring at the screen.

Something's shifted.

Look, I've been using Claude Code for about three weeks now. Mostly day-to-day grunt work, occasionally for side projects on weekends. This isn't some "AI is coming for our jobs" hot take. It's just one terminal-dweller's honest experience with agent-based coding.

What even is it?

It's an AI coding assistant that lives in your terminal. No UI. No sidebar. Just your familiar black window, a command, and it gets to work.

The approach is completely different from Copilot's "let me guess your next line" vibe. Claude Code runs in agent mode — you throw a task at it, and it reads files, writes code, runs commands, checks errors, fixes them, runs again, and loops until it's sorted. You just watch the terminal scroll past like there's a seasoned dev pair programming with you.

I was sceptical at first. I've tried a few AI coding tools that looked brilliant in demos but couldn't even get import paths right in practice. Then Claude Code handled its first task and I genuinely had to sit down.

Case 1: The seven-layer if-else nightmare

I've got this legacy data cleaning script. 1,800-odd lines. The core logic is nested seven layers deep in if-else statements. Every time requirements changed, I'd need a solid 20 minutes of mental preparation and a small prayer afterwards.

I ran this in the project directory:


claude "refactor src/data_cleaner.py, extract nested conditionals into strategy pattern, keep functionality identical"

It read the file first — I could see it running cat and grep in the terminal — then started hammering out code. About three minutes in, it paused and told me something interesting: one branch in the original logic could never be reached because an earlier condition already covered it. Did I want it removed?

I just stared. That dead code had been sitting there for two years. I'd always felt something was off but never bothered to verify. It spotted it. And it included the logical deduction.

Final output: strategy pattern split into seven classes, none over 40 lines, with complete unit tests. I ran them. All green. Ran a diff against production data. Output identical.

Honestly? Doing this refactor by hand would've taken me a full day. It took five minutes.

Case 2: When it went rogue on my webpack config

It's not all smooth sailing, obviously. Second week, I asked it to add dark mode to a Next.js project.

Simple enough task — just a theme toggle. After it updated the components and added the context provider, I did my usual git diff check and... wait. It had modified my next.config.js. Specifically, it had "optimised" a custom webpack configuration I'd added six months ago to support some ancient dependency.

Actually, let me correct that — it didn't modify it. It deleted it. The comment literally said // DO NOT TOUCH - LEGACY DEPENDENCY in capital letters. Apparently Claude Code doesn't understand human desperation signals.

The legacy dependency exploded. White screen of death. I spent over an hour debugging before git bisect traced it back to that change.

Lesson learned: Always git commit before letting it loose. Its "initiative" cuts both ways — it'll find hidden bugs for you, but it'll also make changes it "thinks are right" without asking. I've made it a habit now: commit before any big task, review the diff carefully afterwards. Letting it work on uncommitted code is like handing your car keys to a stranger.

Case 3: Cross-file changes that actually made sense

The third scenario is what convinced me this agent approach has legs.

I asked it to "standardise all API route error handling to the new ErrorResponse format." Fourteen route files. Each one handled errors slightly differently.

The old-school approach: open each file, manually update it, curse past-me for not standardising earlier. Or write a script, but regex would struggle with all the variants — some used res.status(500).json(), others throw new Error(), one had a custom error wrapper. A proper mess.

Claude Code's approach: grep for all relevant files, read each one, identify the different error handling patterns, write replacement logic for each pattern, apply them one by one. Then it ran linting and type checking on everything.

One file mixed GraphQL and REST error handling. It spotted the distinction and only changed the REST bits, leaving GraphQL untouched. That level of contextual awareness surprised me. Though maybe my expectations were just rock-bottom after being burned by other AI tools.

No UI: the good and the bad

The good: Fast. Lightweight. Completely integrated into my workflow. I don't switch to a browser, don't touch the mouse. I state what I need in the terminal, switch to another window, and carry on. It tells me when it's done. This experience is nothing like an IDE plugin — it doesn't interrupt you, and you don't have to watch it write code line by line.

Because it's agent-based, it runs commands and checks results itself. Writes code, runs tests, sees failures, reads error messages, fixes them, runs again. This self-verification loop makes the final output significantly better. From what I understand, it's running Claude 4 Sonnet under the hood (the May 2025 version), and the coding capabilities are genuinely stronger than previous iterations.

The bad: No diff preview, at least not in the current version. You can only git diff afterwards to see what changed. Sometimes it modifies too many files at once and reviewing becomes a chore. And as I mentioned, it occasionally gets overconfident and makes unnecessary changes. It's a bit like working with a technically brilliant junior dev who lacks social awareness — you need to keep an eye on things.

How it compares to Cursor and Copilot

I've used all three. Quick thoughts:

Copilot: Autocomplete is slick, noticeably speeds up writing code, but won't help with architectural decisions. Perfect for "I know what to write, just help me write it faster."
Cursor: Great interactive AI coding experience, ideal for writing while asking questions, friendly UI. I know quite a few newer devs who prefer this.
Claude Code: Built for "throw a task at it and walk away." Terminal-native, strong agent capabilities. Higher barrier to entry though — you need to be comfortable with the command line.

I use them together now: Copilot for day-to-day autocomplete, Claude Code for big refactors or exploratory work. They don't conflict. Think of it as the difference between cruise control and a co-driver.

Who should actually use this?

You'll probably enjoy it if:

You spend most of your day in the terminal
You frequently need to refactor or make cross-file changes
You want to eliminate the "I know what to do but can't be bothered" busywork
You don't mind reviewing diffs afterwards (or even enjoy the code review process)

If you're just starting out, Cursor's UI-driven approach might feel friendlier. Claude Code is still a command-line tool with a learning curve. And honestly, if you're not comfortable reading diffs or understanding what it changed, you might stumble into some traps.

Final thoughts

Three weeks in, Claude Code has earned a spot in my top three tools. It's not magic — sometimes it's daft, sometimes it "over-optimises" — but its ability to understand task intent and execute autonomously makes me willing to put up with the quirks.

AI coding tools have evolved ridiculously fast this past year. From basic code completion in early 2024, to Cursor's conversational assistance mid-year, to this agent-based execution now — each iteration redefines what a developer's workflow looks like. The 2025 competition between Anthropic and OpenAI has gone white-hot, which is honestly great for us.

Here's where I've landed: repetitive, mechanical coding that doesn't require much judgement will increasingly rely on these tools. But understanding the business, making architectural decisions, reviewing AI-generated code — those things have become more important, not less.

If I had one complaint, it's that it still can't write my weekly status reports.

Have you tried Claude Code or similar agent-based coding tools? What was your experience like? Any entertaining disasters to share? Drop a comment — I've got coffee ready. ☕

claudecode #ai #developer-tools #productivity #programming

I Let an AI Agent Loose on My Terminal — Here's What Happened After 3 Weeks

I Let an AI Agent Loose on My Terminal — Here's What Happened After 3 Weeks

What even is it?

Case 1: The seven-layer if-else nightmare

Case 2: When it went rogue on my webpack config

Case 3: Cross-file changes that actually made sense

No UI: the good and the bad

How it compares to Cursor and Copilot

Who should actually use this?

Final thoughts

claudecode #ai #developer-tools #productivity #programming

Cael Lee

Ready to get started?