How I Tamed AI Function Calling with a 50-Year-Old Computer Science Trick
How I Tamed AI Function Calling with a 50-Year-Old Computer Science Trick
Last summer, I nearly quit a fintech project. Not because the tech was hard—because our chatbot started moving money around without permission.
Here's what happened: a user said "hang on, let me change that number," and our AI assistant executed a transfer. Just like that. Thousands gone in milliseconds. Our team sat in Slack, watching the logs roll in, utterly horrified.
That's when I realised Function Calling isn't the simple API feature the docs make it out to be. It's a dialogue negotiation where the stakes are measured in actual currency—and sometimes, that currency isn't yours.
Today I want to share how I eventually tamed these wild language models using a piece of technology that's older than I am: the finite state machine.
TL;DR for the Skimmers
- 67% of Function Calling failures aren't about tool selection—they're about the model calling functions at the wrong time
- LLMs are probability engines, not state managers. They don't naturally understand conversation phases
- Finite State Machines (FSMs) constrain when models can call which functions, not how they call them
- Each conversation state exposes only relevant tools, making dangerous mis-calls physically impossible
- We saw a 74% drop in conversation-flow-related complaints after implementing this
- You probably don't need this if your app is single-turn. If it's multi-step with backtracking? You absolutely do
Why Your Function Calling Keeps Crashing (And It's Not the Model's Fault)
Let me throw some data at you. The LangChain community ran a developer survey back in September 2024. They found that 67% of Function Calling failures had nothing to do with the model picking the wrong tool. Nope. The problem was when it called them.
Think about that for a second. The model's timing was off—it jumped the gun when the user was still clarifying, or froze up when the user tried to correct something mid-flow.
Here's the uncomfortable truth: large language models are glorified probability machines. You define a bunch of functions, and they calculate the most likely tool to call based on context. But real conversations don't work like that. People interrupt. They backtrack. They say "no, wait, actually..." seven times before committing.
A Function Calling system without state constraints is like a junction with no traffic lights. Cars can technically drive through it. Eventually, two of them will meet at the wrong angle.
My Most Expensive Debugging Session
I hit this wall hard on an order tracking feature. A user said "check my orders from last week." The model called query_orders, which returned five results. Then the user said "no, the week before that."
The model called query_orders again—but it only updated the date parameter. The order type filter the user had confirmed earlier? Gone. Vanished. The model assumed this was a brand new query, not a correction to the existing one.
Why? Because the model had no concept of being in a "modifying search parameters" state. It saw a fresh request and treated it accordingly. The user, naturally, saw this as the AI having the memory of a goldfish.
This is the core problem: LLMs are brilliant at understanding intent, but terrible at managing conversation stages and boundaries. And honestly, that's not their fault. It's not what they were trained to do.
Finite State Machines: The Old-School Solution Nobody's Talking About
Finite State Machines aren't exactly the shiny new thing. They've been around forever—compiler design, game AI, network protocols. You've probably used them without realising it.
But applying FSMs to Function Calling? That's what I've been experimenting with for the past six months, and I'm convinced it's the most reliable approach out there right now.
Here's the mental model:
- States = where you are in the conversation (e.g., "collecting parameters," "waiting for confirmation," "executing")
- Transitions = what moves you between states (user input, system events)
- Operations = which functions are allowed in each state
For that order tracking nightmare, I drew up something like this:
[Idle] → user says "check orders" → [Collecting Parameters]
[Collecting Parameters] → parameters complete → [Querying]
[Collecting Parameters] → user says "hang on, change that" → [Collecting Parameters] (loop back, update params)
[Querying] → results returned → [Waiting for Feedback]
[Waiting for Feedback] → user says "wrong, different filters" → [Collecting Parameters]
[Waiting for Feedback] → user satisfied → [Idle]
The killer feature here? Each state only exposes a subset of available functions. When the system is in Collecting Parameters, the model cannot even see execute_payment. It's not that the model chooses not to call it—it literally doesn't exist in the current tool list.
Wait—I should clarify something. When I say the model "can't see" a function, I don't mean I'm doing anything clever with the model architecture. I'm simply filtering the tools array in the API request based on the current state. OpenAI's endpoint supports this natively—you pass different tool definitions with each request. I initially overcomplicated this, thinking I needed some middleware permission layer. Turns out, it's dead simple.
Three Things That Will Bite You When You Actually Build This
Pitfall #1: How Granular Should Your States Be?
I went way too fine-grained at first. Separate states for collecting date, collecting amount, collecting recipient... the state diagram looked like spaghetti, and I couldn't debug it even if I wanted to.
Worse, users who said "send £500 to Sarah for tomorrow" in a single breath broke the whole thing. The system had no idea how to handle multiple parameters arriving at once because it was designed for one-at-a-time collection.
I've since adopted a rule: states should correspond to conversation intent phases, not individual parameters. A single Collecting Parameters state lets the model figure out which parameters it has and which are missing. The FSM manages the big-picture flow. The model handles the small-scale understanding. They play to their strengths.
This covers maybe 80% of use cases. That remaining 20%—complex multi-step workflows with nested logic—probably needs hierarchical state machines. But that's a conversation for another article.
Pitfall #2: How Do You Trigger State Transitions?
Hard-coding rules like "if the user says 'okay' then move to the next state" is a fool's errand. Users express confirmation in thousands of ways: "fine," "go for it," "sounds good," "do the thing," "🚀." I once saw someone respond with just "👌" in our production logs.
My approach: let the model decide the transition, but restrict its options to a tiny set. Every state gets a dedicated routing prompt that only lets the model choose from 2-3 possible transitions. In the Waiting for Confirmation state, it can only output confirm, modify, or cancel. The FSM then executes the state change.
This gives you the best of both worlds—you leverage the model's language understanding without letting it roam free. A colleague of mine calls it "drawing circles for the model." Inside the circle, it's creative. Step outside, and it hits a wall.
Pitfall #3: What About Unexpected Abandonment?
Users are unpredictable. They'll say "actually, forget it" or "switch to a different thing entirely" mid-flow. Cross-state jumps like these were a nightmare to debug. I remember one session where a user said "show me yesterday's news" while in the Waiting for Confirmation state, and the entire FSM locked up because there was no transition path to anything news-related.
My fix: a global escape hatch. No matter what state you're in, if the model detects the user wants to abort the current flow or switch to a new task, it forces a jump back to Idle, clears the context, and starts fresh. This escape route has the highest priority—it's checked before anything else in the routing prompt.
After we deployed this at the fintech company, conversation-flow complaints dropped 74%. The model wasn't any smarter. We'd just built it a very safe cage. Customers don't care about your clever architecture—they care about whether money ends up where it should.
Here's What It Actually Looks Like in Code
The implementation isn't particularly complex. The architecture looks like this:
User Input → Intent Classification (limited options) → FSM Transition →
Determine Current State → Select Functions from State Pool → Generate Function Call
Here's a stripped-down version that should be production-ready (Python 3.12, OpenAI SDK v1.52.0):
from enum import Enum
from openai import OpenAI
class DialogState(Enum):
IDLE = "idle"
COLLECTING_PARAMS = "collecting_params"
EXECUTING = "executing"
WAITING_CONFIRMATION = "waiting_confirmation"
class DialogFSM:
def __init__(self):
self.state = DialogState.IDLE
self.context = {}
self.client = OpenAI()
def get_allowed_functions(self):
"""Only expose functions relevant to the current state."""
function_map = {
DialogState.IDLE: [classify_intent],
DialogState.COLLECTING_PARAMS: [update_params, query_preview],
DialogState.EXECUTING: [execute_action],
DialogState.WAITING_CONFIRMATION: [
confirm_action, modify_params, cancel_action
],
}
return function_map[self.state]
def get_allowed_transitions(self):
"""Restrict routing options per state."""
transition_map = {
DialogState.IDLE: ["start_task"],
DialogState.COLLECTING_PARAMS: [
"params_complete", "user_modify", "user_cancel"
],
DialogState.EXECUTING: ["execution_done", "execution_failed"],
DialogState.WAITING_CONFIRMATION: [
"user_confirm", "user_modify", "user_cancel"
],
}
return transition_map[self.state]
def process_turn(self, user_input: str):
# Step 1: Route the intent with constrained options
route_prompt = self._build_route_prompt(user_input)
route_response = self.client.chat.completions.create(
model="gpt-4o",
messages=route_prompt,
response_format={"type": "json_object"}
)
intent = route_response.choices[0].message.content
# Step 2: Execute the state transition
self._transition(intent)
# Step 3: Build tools for the new state
allowed_tools = self._build_tools_for_state()
# Step 4: Actual Function Calling with constrained tools
response = self.client.chat.completions.create(
model="gpt-4o",
messages=self._build_context(user_input),
tools=allowed_tools,
tool_choice="auto"
)
return response
The critical bit is getallowedfunctions. It physically limits which function definitions the model receives. You build the tools parameter dynamically with each request rather than dumping everything in at the start.
I started with gpt-4-turbo and later switched to gpt-4o, which bumped our routing accuracy from 91% to about 96%. From what I've seen, Claude 3.5 Sonnet performs similarly here, but its tool_use format isn't compatible with OpenAI's, so migration costs are a consideration if you switch providers often.
Do You Actually Need This?
Not every Function Calling scenario justifies a state machine. If your tool calls are single-turn and stateless—translating text, summarising articles—the model's default behaviour is probably fine. Adding an FSM would be overengineering, and I'm saying that as someone who genuinely loves these things.
But you should seriously consider this approach if you recognise any of these signals:
- Multi-step operations: Users need to provide information across multiple turns
- Modifiability: Users might need to backtrack and change parameters before execution
- High-risk actions: Getting it wrong causes real damage—transfers, deletions, sending things
- Nested intents: Users might switch tasks halfway through the current one
My team now has a simple heuristic: we sketch a conversation flow diagram. If it has more than three nodes and contains loop-back arrows, we reach for the FSM. No debate. We settled on this rule in a retrospective last November, and it hasn't failed us yet.
Where This Gets Tricky
I should be honest about the limitations here. FSMs aren't a silver bullet.
The maintenance cost scales badly. That fintech project eventually grew to 11 states, and the transition logic started feeling fragile. I'm currently eyeing hierarchical state machines as a refactoring path, but that's going to be a whole project in itself.
There's also a latency trade-off. You're making an extra API call for intent routing before the actual Function Calling happens. In our case, this added about 200-400ms per turn. For a customer service chatbot, that's acceptable. For a real-time voice agent? Probably not.
And if your users have complex, multi-intent utterances—"cancel my last order and book a new one for Friday"—you'll need some clever intent decomposition. A flat FSM won't handle that gracefully.
None of these are dealbreakers, mind you. They're just things I wish someone had told me before I started.
The Bigger Picture
Here's what I've come to believe after a year of building these systems:
LLMs are astonishingly powerful. But power without boundaries is just chaos. The finite state machine doesn't tell the model how to speak or what to say—it tells it when it can say certain things and which actions it can take.
I've now validated this pattern across three projects: fintech customer service, an e-commerce shopping assistant, and an internal knowledge base Q&A system. It's held up better than I expected in all three.
The AI industry spends a lot of time chasing new architectures and bigger models. But sometimes the solution isn't more intelligence—it's better constraints. A 50-year-old computer science concept, applied thoughtfully, can solve problems that state-of-the-art models create.
What about you? Have you run into Function Calling disasters in production? Or have you found a different approach to managing conversation flow that works better? I genuinely want to know—drop a comment below. We're all figuring this out as we go, and I'm just sharing the tool that's currently working for me.
#FunctionCalling #LLM #AIEngineering #FiniteStateMachine #ConversationAI #OpenAI #Python
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.