Why Your RAG System’s Chunking Strategy Is Probably Broken (And How I Fixed Mine)
Why Your RAG System’s Chunking Strategy Is Probably Broken (And How I Fixed Mine)
TL;DR: Swapping static for dynamic chunking boosted our legal-tech RAG system’s recall by 23%. Sounds like a quick config change, right? It wasn't. It took two weeks, a lot of cold brew, and some 3 AM breakthrough moments in a Berlin apartment. Here's the code, the benchmarks, and the things I broke along the way.
Cover image description: A split-screen illustration showing puzzle pieces snapping together dynamically on one side versus rigid rectangular blocks on the other, with a search bar glowing between them.
The Night Everything Broke (And Why I Started This)
2 AM in my Berlin apartment. My third cup of cold brew sat untouched. I'd switched to cold brew around midnight because I couldn't be bothered to make another french press. We all know that feeling.
Our legal-tech RAG system was returning completely irrelevant case precedents for contract law queries. The embeddings were solid — I'd fine-tuned them myself. The retrieval pipeline was clean. We were using the right vector database with proper indexing. But users kept reporting that "the system doesn't understand context."
I spent 3 hours on this bug before realizing: it wasn't a retrieval problem at all.
It was how we were slicing the documents. The chunker was literally splitting definitions away from their parent clauses. Section headers from their content. Cross-references from their context.
That night sparked a two-week experiment comparing dynamic versus static chunking strategies. Here's what we learned. Well—here's what I learned. And broke. And frantically fixed at 4 AM while my neighbors probably debated calling the police about the guy muttering "sentence boundaries" repeatedly.
Setting Up the Experiment
We used a dataset of 1,200 German legal documents. Got permission from a Berlin law firm — shoutout to Lena for making those calls when I was too swamped with the actual engineering.
The goal was simple:
- Query: "What are the notice requirements for contract termination under BGB § 623?"
- Expected: Return the specific section with context about termination notice periods
- Metric: Recall@5 — did the correct chunk appear somewhere in the top 5 results?
Actually, wait—I should clarify. We also tracked Mean Reciprocal Rank and precision@1, but recall@5 was what the lawyers cared about. They wanted to know if the answer was somewhere in the results. Made sense for their workflow — they'd rather scan five chunks than miss the answer entirely.
The Static Chunking Approach
def static_chunker(text: str, chunk_size: int = 512, overlap: int = 50) -> List[str]:
"""
Simple, predictable, but often cuts through sentences.
I've written this exact function at least 20 times.
"""
chunks = []
start = 0
while start < len(text):
end = start + chunk_size
chunk = text[start:end]
# Add overlap for next chunk
start = end - overlap
chunks.append(chunk)
return chunks
I think I first wrote this in 2021. It's fine. It works. It's also the software equivalent of slicing a baguette with a ruler — clean, consistent, but you'll definitely cut through some raisins in unfortunate ways.
If you're building a quick prototype? Sure, static works. If users are relying on this for actual legal research? That's where the trouble starts.
The Dynamic Chunking Approach
Here's what I built around 1 AM, right after the cold brew finally started working:
import spacy
nlp = spacy.load("de_core_news_lg") # German legal text, obviously
def dynamic_chunker(text: str, target_size: int = 512) -> List[str]:
"""
Respects sentence boundaries and section headers.
The coffee finally kicked in when I built this.
"""
doc = nlp(text)
chunks = []
current_chunk = ""
for sent in doc.sents:
# Check if adding this sentence exceeds target
if len(current_chunk) + len(sent.text) > target_size and current_chunk:
chunks.append(current_chunk.strip())
current_chunk = sent.text
else:
current_chunk += " " + sent.text
# Don't forget the last chunk (learned this the hard way)
if current_chunk:
chunks.append(current_chunk.strip())
return chunks
That if current_chunk line at the bottom? Yeah. That was a 4-hour bug at 3 AM.
I kept getting missing final chunks and couldn't figure out why my recall numbers kept tanking. The debugger showed nothing obviously wrong. Turns out I was just... not appending the last chunk. The chunk was being built, the loop exited, and then it just vanished into the void. Classic.
🔥 The key difference: Dynamic chunking respects natural boundaries — sentence endings, section breaks, semantic completeness. It doesn't just count tokens and slice.
The Results (With Real Numbers, Finally)
We tested both strategies across 150 real legal queries. These were anonymized queries from the firm's internal system, not synthetic examples I cooked up. Here's what happened:
| Strategy | Recall@5 | Avg Chunk Size | Processing Time |
|---|
| Static (512 tokens) | 67.3% | 512 | 0.4s/doc |
|---|
| Dynamic (target 512) | 82.6% | 487 | 1.2s/doc |
|---|
| Dynamic (target 256) | 78.1% | 241 | 0.9s/doc |
|---|
| Hybrid (static + dynamic) | 84.2% | 510 | 1.5s/doc |
|---|
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.