Home / Blog / 实测对比:LoRA显存降至1/5,但知识注入F1低8个点 (English)

实测对比:LoRA显存降至1/5,但知识注入F1低8个点 (English)

By CaelLee | | 6 min read

实测对比:LoRA显存降至1/5,但知识注入F1低8个点 (English)

Generated: 2026-06-22 04:54:05

---

Interviewing for a Large Model Position? This Set of LoRA Questions Will Instantly Reveal Whether You Truly Know Your Stuff or Just Reciting Lines!

Have you ever met someone like this? Their resume says "Proficient in LoRA fine-tuning," but when you dig into it, all they can muster is "saves GPU memory." Push a little deeper, and they can't even explain how the A and B matrices are initialized… I’ve interviewed way too many such candidates, and every time it makes me want to flip the table.

Eventually, I figured it out: instead of getting angry, I turned LoRA into a structured set of interview questions. Starting from basic project experience all the way to the math and engineering implementation—it basically gives you a thorough read of where someone truly stands.

So today, I’ll break down my question design and grading standards for you. If you're preparing for a large model role, this article is far more useful than memorizing a hundred eight‑legged essays (that’s Chinese slang for rote standard answers).

---

Level 1: Don't Give Me Theory—First Tell Me What You've Actually Done

Question 1: In what scenarios have you used LoRA? Which models have you fine-tuned?

It’s like asking "What are your hobbies?" on a first date—a warm‑up question that still eliminates a huge chunk of people.

You know what? I’ve had candidates start reciting right away: "LoRA is a parameter‑efficient fine‑tuning method that uses low‑rank decomposition…" — Stop right there! That’s not what I asked. I’m asking about your real‑world project experience!

If you’ve only run a demo script from LLaMA‑Factory, tweaked a config file to get llama‑7B running, does that count as "experience"? That’s called "following the documentation and typing commands"—anyone can do that!

What kind of answer am I looking for? One that clearly describes the business scenario: Was it for vertical domain knowledge injection? Building a role‑playing customer service agent? Training a code model? Which base model did you use—Qwen, ChatGLM, or LLaMA? What training framework—accelerate with your own training script, DeepSpeed, or a polished wrapper like LLaMA‑Factory?

See? This one question immediately reveals whether you've been doing real, hands‑on work or just playing with toys.

Question 2: Comparing LoRA with full fine‑tuning, what are the actual differences in memory, speed, and results?

This question is specifically designed to test whether you’ve run comparative experiments yourself!

What annoys me most when hiring is someone who says without hesitation, "LoRA’s results are about the same as full fine‑tuning." About the same? How much the same? On what kind of task? Under what data size? Give me specifics!

Let me share my actual measurements:

Question 3: Which hyperparameters did you tune when training LoRA? What does each one do?

This question is a true watershed—people who only run scripts get stuck here.

Better candidates can name r, alpha, dropout, target_modules, etc. But what does someone with real experience tell you? Let me walk you through the pitfalls I’ve fallen into:

---

Level 2: Theory Questions Reveal Whether You Truly Understand or Are Just Reciting Scripts

Question 4: What is the core principle of LoRA?

This looks like a freebie, but few people answer it with real depth.

The minimum is to say: freeze the pretrained weights W₀, introduce two low‑rank matrices A and B, and the forward pass becomes h = W₀x + BAx.

But what I really value is when I ask a follow‑up:

“How are A and B initialized? And why?”

Answering “A is initialized randomly, B is initialized to zero” is just the surface. Being able to explain that “this makes ΔW zero at the start of training, so the model starts fine‑tuning from the pretrained state”—that’s a bit better. If you can go further and say “if A were also zero‑initialized, the gradient wouldn’t flow, and the parameters would never update”—that’s true understanding! That’s real skill!

Question 5: What is a low‑rank matrix? Why can we assume that the update to a large model’s weights is low‑rank?

You need to explain three things clearly.

First, what is rank? Put simply, it’s the amount of truly independent information in a matrix. A 2000×2000 matrix with rank 10 has almost all its information redundant. Counterintuitive, isn’t it?

Second, low‑rank decomposition—splitting a large matrix into the product of two smaller matrices. For example, a d×d matrix becomes d×r and r×d, reducing parameters from d² to 2dr. When r is much smaller than d, the savings are huge!

Third, why does this work? Think about it—according to Aghajanyan et al.’s 2020 research, pretrained large models have an extremely low “intrinsic dimension.” The parameter updates needed during fine‑tuning move only within a very small subspace. Let me give you an analogy: the pretrained model is an all‑knowing master; fine‑tuning is just adjusting its speaking

C

Cael Lee

Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.

Ready to get started?

Get your API key and start building with 180+ AI models.

Get API Key Free