推荐一个可交互的 Attention 可视化工具!我的T (English)
推荐一个可交互的 Attention 可视化工具!我的T (English)
Generated: 2026-06-21 18:00:08
---
I Found a Tool That Can "See Through" the Transformer's Brain! It Completely Wrecked Me 😭
Remember the first time you tried to learn Transformer and had a complete meltdown?
Let me start with mine.
I was on the toilet scrolling through the masterpiece Attention Is All You Need, and I dove in full of confidence. Three days later—the math I could kind of fake my way through, but when it came to the actual code? Man, I was totally lost.
How do Q, K, and V actually pair up? Twelve attention heads—they say each one is like a "reading comprehension officer" from a different angle? But who handles grammar and who handles semantics? Nobody tells you!
And the worst part? Those online tutorials out there—either they show you a single dull static diagram where you see one head and nothing else, or they make it look like a galaxy star map with fancy 3D effects that only look good for a screenshot on your Moments but fall apart the moment you try to debug.
Infuriating, right?
So I dreamed of having a tool—a real tool that can show you what's going on inside the model's brain.
Last month, I spent three days testing every major Transformer visualization tool on the market, combined with all the pitfalls I ran into while building my own. Today, I'm giving you the full report.
---
So Which Tools Actually Deliver?
I tested three:
- Transformer Explainer — built by Georgia Tech + IBM
- DODRIO — also from Georgia Tech
- My own little project
Let's start with the first one: Transformer Explainer. This thing really delivers. I'm genuinely impressed.
It runs a small GPT-2 model directly in your browser (124M parameters—not huge, but enough). You type in a sentence, and it shows you how data flows through every component in real time—from embedding to Transformer blocks to predicting the next token. You see the whole process crystal clear.
And the Sankey diagram? I literally screamed!
Picture this: you input the phrase "the cat sat on the" and it immediately shows you how each token's attention weight is distributed to the other tokens. Is "the" focusing on "cat" or "sat"? You can see it at a glance. No guessing.
I tried a sentence: "The hungry cat caught a mouse in the garden."
Guess what?
When processing "caught," attention head 7 in layer 3 had an insanely high weight on "cat." That tells you something: at that layer and that head, the model is specifically responsible for capturing the relationship between the subject and the verb!
Think about it—being able to see this in real time while tuning the model? It's like having a God's-eye view.
DODRIO takes a different approach.
It draws attention heads from different layers as colored bubble dots. The deeper the color, the higher the semantic score; the bigger the bubble, the stronger the significance. One glance and you know which heads care about syntax (like the relationship between a preposition and the preceding noun) and which heads are tracking semantics (like whether "Xiao Ming" and "he" refer to the same person).
Honestly, the first time I used DODRIO to look at BERT's attention distribution, I almost slapped my thigh.
I used to think that when the model does classification, it looks at the whole sentence from start to finish. But no—it only fixates on a few key words! The weights everywhere else are practically zero! That's when it hit me: The model isn't reading the whole sentence—it's locking onto just two or three keywords.
See, a lot of the time we think the model is "understanding," but really it's just "locking on."
---
Here's the Straight-up Comparison
| Dimension | Transformer Explainer | DODRIO | My Project |
|---|
| Real-time | Runs GPT-2 in the browser, real-time inference | Precomputed weights | Static, no movement |
|---|
| Setup | Open a webpage and go | Has a demo to play with | Requires local code |
|---|
| Interaction depth | Adjust temperature, expand math operations | Filter specific heads/layers | Only click fixed positions |
|---|
| Beginner friendly | Extremely—goes from macro to micro step by step | Requires some foundation | Okay for personal use |
|---|
| My rating | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
|---|
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.