Home / Blog / Structured Outputs Saved Me From Contract Parsing ...

Structured Outputs Saved Me From Contract Parsing Hell (And Probably a PIP)

By CaelLee | | 8 min read

Structured Outputs Saved Me From Contract Parsing Hell (And Probably a PIP)

Last Friday at 2 AM, I was still fighting with nested liability clauses in a 200-page contract PDF, coffee getting cold, wondering where my life went wrong.

My boss had said the magic words: "extract the key clauses by Friday." I thought, this is just some regex and a few if-else blocks, maybe 4 hours tops. Plot twist: I was spectacularly wrong. The nesting was insane—liability clauses inside termination clauses inside jurisdiction clauses, like Russian dolls designed by lawyers who hate developers.

That's when I finally got serious about OpenAI's Structured Outputs. Not "read the announcement tweet and nodded along" serious, but "read the actual docs at 2 AM while questioning my career choices" serious.

Here's what I wish I'd known six months ago.

Why Traditional JSON Extraction Is a Trap We All Fall Into

Look, we've all done it. The standard playbook for structured extraction goes something like:


# 千万别这么写,这是血泪教训
try:
 data = json.loads(response)
 name = data.get("name", "未提取到")
 # 然后祈祷字段别缺失、类型别错...
except:
 name = "解析失败"

This approach has three fatal flaws that'll murder your reliability:

The model randomly omits fields it considers "obvious" or "implied." Your contract has a termination date, but the model decided "eh, it's in the preamble, not important" and skipped it. Your parser explodes. You get the Slack notification. Fun times.

Nested structures come back half-baked. You'll get a party object with a name but no address, because the model apparently got distracted mid-JSON. Now your downstream code has to handle 47 edge cases of partial objects.

Field types play musical chairs. amount is a number 90% of the time, then suddenly it's "one thousand five hundred" because the model "thought it looked nicer." This is why I aged three years in 2023.

I was working at a Berlin-based legal tech startup parsing rental contracts, and our QA team had a running joke about my fix commits. Every few days: "fix: empty string penalty field again." The QA lead actually bought me a mug that said "World's Okayest JSON Parser." I still use it. It holds my tears.

Structured Outputs fixes this at the protocol level. But here's the thing—wait, I should clarify. OpenAI released this in August 2024, but I ignored it for three months because I thought "it's just enforced JSON, I can do that with a better prompt." Sound familiar? Yeah, we all have that phase.

The `required` Field: Your Contract With the Model

The core idea is dead simple: define what you want with a schema, not a prayer. The required array is your non-negotiable baseline.

Here's a real example from rental contract extraction:


{
 "type": "object",
 "properties": {
 "tenant_name": { "type": "string" },
 "monthly_rent": { "type": "number" },
 "contract_start": { "type": "string", "format": "date" },
 "deposit_amount": { "type": "number" }
 },
 "required": ["tenant_name", "monthly_rent", "contract_start"]
}

Notice I deliberately kept deposit_amount outside the required array. Why? Because in German contracts, the deposit is sometimes specified in a separate annex. If you force required, the model will invent data to satisfy the schema—and hallucinated data is way worse than missing data.

Hard lesson I learned the embarrassing way: On my first attempt, I shoved everything into required. Model hits a contract with no deposit clause, and it returns {"deposit_amount": 0}. Legally, "no deposit" and "zero deposit" are night-and-day different. One means "this contract doesn't require it," the other means "the deposit is explicitly zero euros." Our compliance team was... not amused.

I later found out this exact problem came up at OpenAI's DevDay in November 2024. A healthcare startup had allergy_history as required, and when the model encountered patients with no allergy records, it started generating "penicillin allergy." In a medical context. Let that sink in. They emergency-patched their schema that same week. The talk should be on YouTube somewhere—worth watching if you're dealing with sensitive data.

Nested Objects: Complex Structures in One Shot

Here's where Structured Outputs really earns its keep. A full party representation in a contract looks like this:


{
 "party": {
 "type": "object",
 "properties": {
 "name": { "type": "string" },
 "type": { 
 "type": "string", 
 "enum": ["individual", "company", "government"] 
 },
 "contact": {
 "type": "object",
 "properties": {
 "email": { "type": "string", "format": "email" },
 "phone": { "type": "string" },
 "address": {
 "type": "object",
 "properties": {
 "street": { "type": "string" },
 "city": { "type": "string" },
 "country": { "type": "string" }
 },
 "required": ["city", "country"]
 }
 },
 "required": ["address"]
 }
 },
 "required": ["name", "type", "contact"]
 }
}

Three levels of nesting, one API call. Before this, my approach was: extract top-level parties first, then make follow-up calls for each party's details, then stitch everything together. Latency through the roof, and any single call failure cascaded into a mess of partial data.

Real numbers from our tests: Across 500 contracts in our test suite, traditional multi-step extraction hit 78% completeness on nested fields. Structured Outputs nested schema? 96%. The remaining 4% were legitimately missing from the source contracts—bad scans, redacted sections, the usual PDF nightmares.

Actually, let me qualify that 96%. Looking back at our test data, 70% of those contracts were German, and German contracts are notoriously structured (shocking, I know). When I ran a batch of Southeast Asian contracts through the same pipeline, completeness dropped to about 89%. The schema quality matters, but so does your source data. Don't blindly benchmark against my numbers.

A Harder Real-World Case: Shareholder Structures

Last month I tackled German commercial register extracts—think Handelsregister documents—pulling out shareholder hierarchies. The data structure gets gnarly fast:


{
 "company_name": { "type": "string" },
 "shareholders": {
 "type": "array",
 "items": {
 "type": "object",
 "properties": {
 "name": { "type": "string" },
 "share_percentage": { "type": "number", "minimum": 0, "maximum": 100 },
 "is_beneficial_owner": { "type": "boolean" },
 "subsidiaries": {
 "type": "array",
 "items": {
 "type": "object",
 "properties": {
 "name": { "type": "string" },
 "ownership_chain": { "type": "number" }
 },
 "required": ["name"]
 }
 }
 },
 "required": ["name", "share_percentage"]
 }
 }
}

Key design decisions I sweated over:

share_percentage got minimum: 0 and maximum: 100 constraints. Without these, I kid you not, the model once returned 150% ownership. I guess one shareholder was just really committed.

The parentcompany field inside shareholderdetails is not in required. It only exists for multi-tier ownership structures. Force it, and you'll get phantom parent companies for direct shareholders.

The entire shareholders array isn't in required at the top level. German Einzelunternehmen (sole proprietorships) literally don't have shareholders—forcing the model to return an array here is asking for hallucinations.

The bug that kept me at the office until 9 PM: I initially had shareholders as required. When the model hit a sole proprietorship registration, it absolutely refused to return an empty array. Instead, it stuffed the company's legal representative into the shareholders array with 100% ownership. That garbage data made it to our database and nearly triggered a compliance report that would've been... let's say "embarrassing" is an understatement.

The lesson burned into my brain: required isn't a wishlist. It's a reflection of what must exist in reality. Mark something required that doesn't always exist, and you're basically asking the model to lie to you.

Three Design Principles That Actually Work

After six-ish months of structured extraction work, here's what I've internalized:

1. Required = "missing would be an error." If a field can legitimately not exist, leave it optional. Handle nulls in your application code. Don't make the model invent reality. This seems obvious but I see people violate it constantly.

2. Think twice beyond 4 nesting levels. Technically you can go deeper, but accuracy drops off a cliff around level 4-5. We measured ~95% accuracy at 3 levels, dropping to ~82% at 5 levels with gpt-4o-2024-08-06. If your data structure demands 6 levels of nesting, split it into two calls. Yes, it's more API requests, but you'll save yourself the debugging time. I haven't tested this with Claude or Gemini, so don't ask—my company standardized on OpenAI.

3. Constraints are your secret weapon. enum for controlled vocabularies, minimum/maximum for numeric ranges, pattern for format validation. The model genuinely respects these. I had a project where adding enum with 7 contract types took output consistency from 70% to 99%. Seven values. That's all it took.

Seriously. Going from "pray the model picks a valid contract type" to "here are your seven options, pick one" was transformative.

TL;DR for the Skimmers

The Shift That Changed Everything

Structured Outputs did something unexpected to my workflow: I now design the schema first, write the prompt second, and touch application code last. The schema is my contract with the model. The required fields are the non-negotiable clauses. Everything else is nice-to-have.

It's not just a technical change—it's a mindset shift from "I hope the model gives me what I want" to "I've defined exactly what I need, and the model will comply or fail explicitly." That explicitness is everything in production.

What's your structured extraction horror story? Hallucinated data that made it to production? Nested objects that came back as string soup? I've probably been there. Drop it in the comments—I'll share my war stories over a virtual coffee ☕

structuredoutput #aiengineering #dataextraction #openai #llm #prodeng

C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free