I Tested 5 OpenAI-Compatible APIs: Function Calling and Streaming Are Not as Compatible as They

Last Wednesday, I nearly launched my MacBook into the Spree. I’m not messing about.

I’d spent three hours debugging what I was sure was a code problem—turns out it was a promise of “compatibility” that was anything but. I was sitting in The Barn in Berlin’s Kreuzberg, on my third Americano, and I swear the barista must have thought I was about to cry. (Or throw a laptop. Both, probably.)

That’s what happens when you assume an API provider implements function calling the same way as OpenAI. Spoiler: they don’t. Not even close.

Anyway, here’s the gist of it.

TL;DR

OpenAI-compatible APIs vary wildly when it comes to function calling and streaming. I tested five major services last Friday (17 Jan 2025, if you’re keeping track), running each endpoint ten times to get averages. My eyes were shot by the end, but it was worth it.

The takeaway: Choose the wrong provider and you could waste a day of coding—or more. Pick with care, and you’ll save yourself at least a few all-nighters.

The Lineup

Here’s who I tested:

OpenAI – the baseline. Used gpt-4-turbo-preview.
Together AI – hosts open-source models. I tried mixtral-8x22b and llama3.1-70b.
Groq – the speed demon. Default model mixtral-8x7b.
Perplexity – search-augmented API. Used llama-3-sonar-large.
Ollama – local model runner, version 0.3.1 on my M1 MacBook.

I focused on two things: function calling support, and streaming speed and reliability. The hidden metric? How likely each one is to make you want to smash your keyboard.

Function Calling: Who Actually Ships It?

Function calling is the backbone of giving LLMs tools. But implementation… well, it’s a mixed bag.

OpenAI

OpenAI is the gold standard. It works out of the box. Though, funny enough, they did change the parameter format slightly in newer models (using tool_choice instead of functions). Minor headache, but at least it’s documented.


const response = await openai.chat.completions.create({
 model: "gpt-4",
 messages: [{ role: "user", content: "Book me a flight to Berlin" }],
 functions: [
 {
 name: "book_flight",
 description: "Reserve a flight",
 parameters: {
 type: "object",
 properties: {
 destination: { type: "string" },
 date: { type: "string", description: "YYYY-MM-DD" }
 },
 required: ["destination", "date"]
 }
 }
 ]
});

Together AI

Together AI offers function calling on models like Mixtral 8x22B and Llama 3.1. The API format looks fully compatible. But here’s the gotcha—not every model supports it. I spent three hours trying to debug why my calls kept failing. I tried different models, tweaked parameters, even questioned my career choices. Eventually I realised: the model I’d picked simply didn’t support function calling. I could have slapped my forehead clean off.

Lesson: always check the docs for model support. Seriously.

Groq

Groq’s function calling is, let’s say, limited. Only mixtral-8x7b and llama3-70b support it, and the parameter naming is slightly different. They use tools instead of functions. It’s a simple enough fix, but if you copy-paste your existing code, you’ll get a confusing error. I know I did.

Perplexity

Perplexity’s API is built for search. Custom function calling? Nope. Not supported. If you’re building a retrieval-based app, it’s brilliant. But if you need tool use, you’re out of luck. I briefly wondered if I could hack it. I couldn’t. Really.

Ollama

Ollama’s OpenAI-compatible mode works great locally, but function calling depends entirely on the model you’re running. With Llama 3.1 or Qwen 2, it’s fine. But if you go for a smaller model like qwen2:0.5b to save memory, it might just ignore your function calls. Don’t ask me how I know.

Streaming: Speed vs Stability

Streaming is where the differences become painfully clear—and where you can get genuinely frustrated.

OpenAI

Standard SSE streaming, token by token. Rock solid, but not blistering. I measured about 45 tokens per second. It’s reliable, but nothing to write home about.

Groq

Groq’s streaming is insane. I measured an average of 487 tokens per second, with peak bursts over 500 and first token latency under 50ms. Honestly, it feels like cheating. But—and there’s always a but—I’ve had it drop the occasional chunk. Just one percent of the time, maybe. But on a live demo, that missing chunk made me look like I had no idea what I was doing. I’ve since added retry logic. Speed versus reliability, you decide.


const stream = await groq.chat.completions.create({
 model: "mixtral-8x7b",
 messages: [{ role: "user", content: "Tell me a story about a Berlin programmer" }],
 stream: true
});

for await (const chunk of stream) {
 process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Together AI

Together AI streams at about 120 tokens per second—not as fast as Groq, but far more consistent. One nice touch: they include usage stats in the stream events. Handy for debugging. And I didn’t see any dropped chunks. A solid all-rounder.

Perplexity

Perplexity’s streaming includes inline citations. Great for search contexts, but if you’re trying to parse it streamingly for structured output, those [citation:1] markers can get in the way. I found myself filtering them out more than getting work done.

Ollama

Local streaming is, predictably, limited by your hardware. On my M1 MacBook I got about 15 tokens per second. Fine for prototyping, but in production? You’ll want the cloud. And yes, you could finish a coffee waiting for a token-heavy response.

The Numbers

Here’s the summary. Each figure is the average of ten runs.

Provider	Function Calling	Streaming Speed (tokens/s)	Notes

OpenAI	✅ Full support	45	Stable but pricey

Together AI	⚠️ Partial	120	Depends on model

Groq	⚠️ Limited	480	Fast but occasional drops

Perplexity	❌ Not supported	80	Search-only use case

Groq is a speed freak’s dream but don’t trust it blind. Together AI feels like the balance.

How I Choose Now

After all the pain, here’s my rough strategy:

Development: Ollama locally. Fast iteration, no network latency. But for heaven’s sake, don’t use it in production unless you have serious hardware.
Production with function calling: OpenAI or Together AI. Reliability over speed.
Pure streaming: Groq (if you can handle the occasional hiccup). Work in error handling.
Search-augmented apps: Perplexity. The citations are actually useful there. Just don’t try to make it do general tool stuff.

There’s no silver bullet. Pick based on your use case, and read the docs carefully.

Final Thought

That afternoon in Kreuzberg, I stared at my screen for hours, coffee after coffee. But that frustration drove me to systematically test these providers—and now I have a much clearer picture.

Was it worth it? Honestly, yes. At least next time I hit a “compatible” API, I know exactly where to look for the gaps. I hope this saves you a few hours of debugging and a couple of stress-related grey hairs.

What’s the weirdest compatibility issue you’ve encountered with these APIs? Or have you found a provider I missed? Drop a comment below—I’d love to hear your war stories. 🚀

OpenAI #APIs #FunctionCalling #Streaming #WebDev #DeveloperExperience

Ollama	⚠️ Model-dependent	15 (M1)	Great for development

I Tested 5 OpenAI-Compatible APIs: Function Calling and Streaming Are Not as Compatible as They

I Tested 5 OpenAI-Compatible APIs: Function Calling and Streaming Are Not as Compatible as They

TL;DR

The Lineup

Function Calling: Who Actually Ships It?

OpenAI

Together AI

Groq

Perplexity

Ollama

Streaming: Speed vs Stability

OpenAI

Groq

Together AI

Perplexity

Ollama

The Numbers

How I Choose Now

Final Thought

OpenAI #APIs #FunctionCalling #Streaming #WebDev #DeveloperExperience

Cael Lee

Ready to get started?

I Tested 5 OpenAI-Compatible APIs: Function Calling and Streaming Are **Not** as Compatible as They

TL;DR

The Lineup

Function Calling: Who Actually Ships It?

OpenAI

Together AI

Groq

Perplexity

Ollama

Streaming: Speed vs Stability

OpenAI

Groq

Together AI

Perplexity

Ollama

The Numbers

How I Choose Now

Final Thought

OpenAI #APIs #FunctionCalling #Streaming #WebDev #DeveloperExperience

Cael Lee

Ready to get started?

I Tested 5 OpenAI-Compatible APIs: Function Calling and Streaming Are Not as Compatible as They