I Tested 5 OpenAI-Compatible APIs: Function Calling and Streaming Are Not as Compatible as They
I Tested 5 OpenAI-Compatible APIs: Function Calling and Streaming Are **Not** as Compatible as They
Last Wednesday, I nearly launched my MacBook into the Spree. I’m not messing about.
I’d spent three hours debugging what I was sure was a code problem—turns out it was a promise of “compatibility” that was anything but. I was sitting in The Barn in Berlin’s Kreuzberg, on my third Americano, and I swear the barista must have thought I was about to cry. (Or throw a laptop. Both, probably.)
That’s what happens when you assume an API provider implements function calling the same way as OpenAI. Spoiler: they don’t. Not even close.
Anyway, here’s the gist of it.
TL;DR
OpenAI-compatible APIs vary wildly when it comes to function calling and streaming. I tested five major services last Friday (17 Jan 2025, if you’re keeping track), running each endpoint ten times to get averages. My eyes were shot by the end, but it was worth it.
The takeaway: Choose the wrong provider and you could waste a day of coding—or more. Pick with care, and you’ll save yourself at least a few all-nighters.
The Lineup
Here’s who I tested:
- OpenAI – the baseline. Used
gpt-4-turbo-preview. - Together AI – hosts open-source models. I tried
mixtral-8x22bandllama3.1-70b. - Groq – the speed demon. Default model
mixtral-8x7b. - Perplexity – search-augmented API. Used
llama-3-sonar-large. - Ollama – local model runner, version 0.3.1 on my M1 MacBook.
I focused on two things: function calling support, and streaming speed and reliability. The hidden metric? How likely each one is to make you want to smash your keyboard.
Function Calling: Who Actually Ships It?
Function calling is the backbone of giving LLMs tools. But implementation… well, it’s a mixed bag.
OpenAI
OpenAI is the gold standard. It works out of the box. Though, funny enough, they did change the parameter format slightly in newer models (using tool_choice instead of functions). Minor headache, but at least it’s documented.
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [{ role: "user", content: "Book me a flight to Berlin" }],
functions: [
{
name: "book_flight",
description: "Reserve a flight",
parameters: {
type: "object",
properties: {
destination: { type: "string" },
date: { type: "string", description: "YYYY-MM-DD" }
},
required: ["destination", "date"]
}
}
]
});
Together AI
Together AI offers function calling on models like Mixtral 8x22B and Llama 3.1. The API format looks fully compatible. But here’s the gotcha—not every model supports it. I spent three hours trying to debug why my calls kept failing. I tried different models, tweaked parameters, even questioned my career choices. Eventually I realised: the model I’d picked simply didn’t support function calling. I could have slapped my forehead clean off.
Lesson: always check the docs for model support. Seriously.
Groq
Groq’s function calling is, let’s say, limited. Only mixtral-8x7b and llama3-70b support it, and the parameter naming is slightly different. They use tools instead of functions. It’s a simple enough fix, but if you copy-paste your existing code, you’ll get a confusing error. I know I did.
Perplexity
Perplexity’s API is built for search. Custom function calling? Nope. Not supported. If you’re building a retrieval-based app, it’s brilliant. But if you need tool use, you’re out of luck. I briefly wondered if I could hack it. I couldn’t. Really.
Ollama
Ollama’s OpenAI-compatible mode works great locally, but function calling depends entirely on the model you’re running. With Llama 3.1 or Qwen 2, it’s fine. But if you go for a smaller model like qwen2:0.5b to save memory, it might just ignore your function calls. Don’t ask me how I know.
Streaming: Speed vs Stability
Streaming is where the differences become painfully clear—and where you can get genuinely frustrated.
OpenAI
Standard SSE streaming, token by token. Rock solid, but not blistering. I measured about 45 tokens per second. It’s reliable, but nothing to write home about.
Groq
Groq’s streaming is insane. I measured an average of 487 tokens per second, with peak bursts over 500 and first token latency under 50ms. Honestly, it feels like cheating. But—and there’s always a but—I’ve had it drop the occasional chunk. Just one percent of the time, maybe. But on a live demo, that missing chunk made me look like I had no idea what I was doing. I’ve since added retry logic. Speed versus reliability, you decide.
const stream = await groq.chat.completions.create({
model: "mixtral-8x7b",
messages: [{ role: "user", content: "Tell me a story about a Berlin programmer" }],
stream: true
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
Together AI
Together AI streams at about 120 tokens per second—not as fast as Groq, but far more consistent. One nice touch: they include usage stats in the stream events. Handy for debugging. And I didn’t see any dropped chunks. A solid all-rounder.
Perplexity
Perplexity’s streaming includes inline citations. Great for search contexts, but if you’re trying to parse it streamingly for structured output, those [citation:1] markers can get in the way. I found myself filtering them out more than getting work done.
Ollama
Local streaming is, predictably, limited by your hardware. On my M1 MacBook I got about 15 tokens per second. Fine for prototyping, but in production? You’ll want the cloud. And yes, you could finish a coffee waiting for a token-heavy response.
The Numbers
Here’s the summary. Each figure is the average of ten runs.
| Provider | Function Calling | Streaming Speed (tokens/s) | Notes |
|---|
| OpenAI | ✅ Full support | 45 | Stable but pricey |
|---|
| Together AI | ⚠️ Partial | 120 | Depends on model |
|---|
| Groq | ⚠️ Limited | 480 | Fast but occasional drops |
|---|
| Perplexity | ❌ Not supported | 80 | Search-only use case |
|---|
| Ollama | ⚠️ Model-dependent | 15 (M1) | Great for development |
|---|
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.