Home / Blog / Stop Building AI Agents. Seriously. (Unless You Ac...

Stop Building AI Agents. Seriously. (Unless You Actually Need Them)

By CaelLee | | 8 min read

Stop Building AI Agents. Seriously. (Unless You Actually Need Them)

Someone asked me yesterday whether their team should build an AI Agent. My answer? 90% of projects don't need one. At all.

Look, I've been writing this AI column for nearly a decade—from the early chatbot days in 2015 to the multi-agent orchestration nightmares of today. I've stepped on more landmines than I've had hot dinners. And this past year, I've watched team after team dive headfirst into building Agents, only to realise halfway through that a simple Workflow would've done the job. Thousands of tokens burned. Thousands. That's a lot of coffee.

Think about it. A task like "scrape a webpage and translate it"—do you really need an Agent autonomously deciding when to scrape, how to scrape, how many times to scrape? Just hardcode the bloody flow. It's faster, stabler, and cheaper. Honestly, some teams get hypnotised by the word "Agent" and forget what they're actually trying to solve.

But here's the real question: when do you need an Agent? And when does a Workflow suffice?

I've been mulling this over for the better part of a year. Or rather, learning it the hard way.

First, Figure Out What You're Actually Doing

Last year I took on an e-commerce customer service system. The client came in hot: "We want an AI Agent." I asked for specifics. Four scenarios: order tracking, refund processing, technical support, complaint handling.

I nearly laughed.

These are almost entirely deterministic paths—tracking an order means querying a database and returning results. A refund means verifying conditions and executing. Where's the "reasoning" needed? What's the Agent supposed to ponder? The meaning of commerce? So, no. No Agent.

I built them four separate Workflows instead. Each one uses an LLM internally for intent understanding and parameter extraction, but the execution path? Fixed. Three months in production. 97% accuracy. 60% of the projected budget.

They were shocked.

Not because the project failed—because their assumptions did. Agent ≠ magic. Actually, it's not even what most people think it is. A bit embarrassing, sure, but the clarity afterwards? Priceless.

The Three Eras of Agent Development

Looking back, I see three distinct phases of Agent development. Let's call them Levels, actually—that feels more accurate.

Level 1: The LLM Agent (2023)

When large models first exploded, Agents were a novelty. People built them for social entertainment—role-playing chatbots, digital companions. The core idea was simple: wrap an LLM in a loop and let it have multi-turn conversations.

If I'm being honest, those early Agents were toys compared to today. But they were fun. Really fun.

Level 2: The Tool-Using Agent (2024)

By 2024, things got serious. Agents actually had to do stuff. Function Calling, Tool Use—these concepts went mainstream. Suddenly Agents could call APIs, query databases, manipulate files.

I shipped a lot of projects in this phase. And stepped in the biggest puddle of all.

Awful tool descriptions.

Seriously.

You spend three days building a robust Agent framework, and the model straight-up refuses to invoke your tools. Why? Because your tool description is one sentence: "Query order information." How's the model supposed to know when to use it? What parameters to fill? Can you blame it? Go on, think about it.

I learned my lesson. Now every tool description gets at least three paragraphs: when to use it, how to fill parameters, what the return format looks like. Yes, it costs more tokens. But my invocation accuracy jumped from 60% to 90%. That's a trade-off I'll take any day. Worth every penny.

Level 3: Multi-Agent Collaboration (2025–Present)

This is where things get properly complex. Not multiple Agents having a meeting—that visual is a bit silly, isn't it?—but different Agents with different responsibilities, toolsets, and permissions.

I'm currently building a code platform with five Agents: Code Reader, Tester, Fixer, Security Auditor, and Documentation Generator. Each has its own toolkit, walled off from the others. The tricky bit? The Router. Deciding which task goes to which Agent—that logic is more important than the Agents themselves. Much more. Last Wednesday I spent four hours tuning Router logic for one edge case. Four hours.

The Core Architecture: Four Non-Negotiable Modules

After all these years, I've boiled it down to a formula:

AI Agent = LLM (Brain) + Memory + Planning + Tool Use

Miss one, and you'll feel the gap. I've tried. Oh, I've tried.

The LLM is the brain, obviously. But choosing the right model matters—and bigger isn't always better. For that customer service system, I used GPT-4o-mini. Cheap, fast, sufficient. Don't reach for Claude Opus unless you genuinely need complex reasoning. Most use cases don't. Really.

Memory has three layers in my setup:

Context window management, by the way, is a classic interview question. A reader of mine interviewed at a Big Tech company—got grilled three rounds on context window strategies. Claude Code's approach is worth studying: a five-layer compression pyramid, from full history to extreme summary, dynamically switching based on the task. I've written a lengthy analysis elsewhere—won't expand here. This article's already running long.

Planning is the soul. Three main paradigms dominate:

Tool Use—and here, MCP (Model Context Protocol) has been everywhere lately. It does standardise tool integration nicely—formatting, access patterns, resource handling. But don't mythologise it.

MCP solves standardisation problems. For the real challenges—tool selection, task planning, multi-agent coordination—it doesn't help much. At least not yet. Not useless, just... bounded.

My current approach: use MCP as the tool integration layer, and my own scheduler for selection and planning. Together, they sing. In my setup, anyway. It's ridiculously fast.

How to Choose: Don't Start with the Most Complex Option

Faced with ReAct, Plan-and-Execute, Reflection, Multi-Agent—how do you pick?

Here's my simple heuristic:

Write out the execution path first. If you can write it, use a Workflow. If you can't, consider an Agent.

More specifically:

I built a travel planner last year using pure ReAct at first. The model kept veering off course, burning tokens furiously. Switched to Plan-and-Execute—model proposes a plan, user confirms, then execute. Stability shot up. The core strategy: P&E as skeleton, ReAct for local flexibility. Not revolutionary, but it works beautifully.

Engineering in Practice: Golang + Observability

On the dev side, I've been building Agents in Golang for the past two years. Why?

Python's ecosystem is rich, but for production-grade systems, Golang's performance and concurrency advantages are stark. Especially in multi-agent setups—goroutines are practically purpose-built for this. On my M2 MacBook Pro, running five concurrent Agents in Python turned it into a slideshow. Golang? Silky smooth. Night and day.

My current stack: Golang + Genkit framework + MCP + A2A protocol.

A2A (Agent-to-Agent) is Google's multi-agent communication protocol, complementary to MCP. MCP handles tool calling; A2A handles inter-agent comms. Tencent's AI assistant "XiaoQ" already uses this architecture, integrating image upscaling, background expansion, and other capabilities. Decent stuff.

Observability is the other massive pitfall.

Debugging an Agent system is an order of magnitude harder than traditional systems. Execution paths are dynamic—you often have no idea why the model made a particular decision. I once spent two hours debugging, only to find a missing comma in the prompt. A single comma. Two hours.

My approach now: full distributed tracing + critical node logging + automatic failure case collection. When something goes wrong, I can trace back to the specific Thought and Action. Most of the time. The first time I configured this, I left a trailing slash in a URL and debugged for two hours. I'll remember that forever.

Final Thoughts

A decade of writing this column, and my biggest lesson: technology serves people, not the other way round.

Agents are powerful. But not every problem needs one. If a Workflow solves it, skip the Agent. If a single Agent suffices, skip Multi-Agent. If a small model does the job, skip the beast. In other words: don't join the rat race.

Get the fundamentals right first—LLM, Planning, Memory, Tools. These four are non-negotiable. Then start with the simplest possible approach. Run it, find the failure modes, and then upgrade. This approach? Honestly, far more reliable than chasing the latest framework.

Don't chase frameworks. Chase problems.

Frameworks will rot. Problem-solving instincts won't. Probably. Well... they might, actually, but they'll outlast any framework.

This might be the longest article I've ever written. I hope it saves you some scars. If you're building Agents too, drop a comment below—I'll likely reply. Unless I'm actively wrestling an Agent bug. In which case, I won't want to talk to anyone. Really. No one.

What do you think? Is the Agent hype real, or have we all been sold a dream?

#ai #agents #softwareEngineering #llm #golang

C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free