我花3年踩遍SD参数坑，从原理到实操一次说清 (English)

Generated: 2026-06-22 07:40:56

---

This SD Parameter Guide Took Me Three Years to Write

To be honest, before writing this, I locked myself in my study and stared at the screen for a long time.

You see, there are countless articles online about Stable Diffusion parameters. But when you flip through them, they're either stiff machine translations or second-hand copies of each other. I went through my notes from the past three years—yes, that notebook filled with "Why did it break again?" and "What the hell does this parameter actually do?"—and decided to take the most straightforward approach: spill everything I've been through. The pitfalls I fell into, the trial-and-error experiences, the principles I only figured out after digging through GitHub. All of it, laid out for you.

This article will help you solve three problems:

First, figure out what those parameters are actually doing—not memorizing definitions, but truly understanding their quirks.

Second, know how to choose parameters to get good results—not superstition, but a systematic approach.

Third, avoid the detours I took—the traps that wasted me dozens of hours, so you don't have to step into a single one.

Alright, let's get started.

---

1. Base Models and External VAEs: You Think You've Chosen Right, But You Might Be Wasting Your Time

1.1 Basic Terms, Explained in Plain English

Base Model (Large Model / Foundation Model)—This is the "artist" for your image. If you pick an anime model, you get anime style; if you pick a photorealistic model, you get a photo-like result. When I first started, I excitedly downloaded a dozen models, only to find that some models always messed up faces—crooked mouths, twisted features. It took me days to realize: it's the model's problem, not my parameter tuning. Think about it: if you hire an artist who can't draw hands, can you blame your pen?

VAE Model—Full name: Variational Autoencoder. Don't let the name scare you. Just think of it as a "filter" after image generation. But note: it's nothing like filters in Photoshop. The VAE is an internal component of the model, responsible for decoding data from the latent space into the images we can see.

Here's the key difference: Every model comes with its own VAE, but some models have "broken" or subpar VAEs. That's when you need to manually attach an external VAE. I've encountered this many times: I download a model, generate a face, and it's always hazy, like a layer of fog. Swap the VAE, and suddenly it's crystal clear. That feeling is like wiping your glasses clean—so satisfying!

When downloading models, remember to check the hash value. This is a tricky one—some models have different names but the same hash, meaning they're almost identical. I didn't know this at first. I downloaded two "different" models, and the results were exactly the same. Wasted hard drive space for nothing. Frustrating, right?

1.2 Differences Between Base Models

On the left is anime style, on the right is photorealistic style. The model determines the basic look of the image. Switching models is like switching artists. Simple as that.

1.3 Differences Between External VAE Models

The left image shows the difference with and without an external VAE—after loading a new VAE, the image clarity improves significantly. The right image shows the differences between various VAE models—the results are completely different.

How to choose a VAE? My advice: use the XYZ Plot function to generate a comparison chart in one go, and pick the one you like best. Don't just use what others say is good. Aesthetics are subjective. Some people prefer cool tones, others warm tones. Your eyes decide.

---

2. CLIP Skip: You Think More Layers Mean Deeper Understanding? Not Really.

I need to talk about this one properly, because I had no clue what it did at first. I spent half a month tweaking it before realizing I'd been messing around blindly.

To understand CLIP skip, you first need to grasp how Stable Diffusion works. In simple terms, Stable Diffusion is a diffusion model—starting from a pure noise image based on your prompt, it gradually "diffuses" into what you want.

CLIP is the part responsible for understanding your prompt. It converts text into vectors the model can understand. CLIP has 12 layers. By default, the model uses all 12 layers.

The CLIP skip parameter means "I only use the first N layers." From my testing, most models work best with a setting of 2. Set it to 1, and the model's understanding of the prompt is too shallow—like it only heard half of what you said. Set it too high (like 6 or above), and the model might over-interpret, generating something barely related to your prompt—like someone reading too much into your words.

Here's a trap: Different models have different sensitivities to CLIP skip. I tried a photorealistic model with a setting of 2, and it worked great. Switched to an anime model, and setting 2 broke it. Setting it back to 1 fixed it. So don't memorize parameters blindly. Experiment. It's like dating—there's no one-size-fits-all approach.

---

3. Sampling Methods: The Part Where I Fell Into the Most Traps, No Contest

This section has the most pitfalls I've encountered. It's painful to talk about.

Let me give you the conclusion first, then the principle. If you're in a hurry, read this:

For speed: Use Euler
For quality: Use DPM++ 2M Karras or DPM++ 2M SDE Karras
For a balance of speed and quality: Use DPM++ 2M (widely considered the best)

Now let me explain the characteristics of each sampler. Think of it like choosing a mode of transportation—each has its own personality.

Classic ODE Solvers

Euler, DDIM—these are older than me by several times. Simple, but not very accurate. When I first started, I always used DDIM and thought the quality was decent. Later I discovered there were better options—like riding a bike and thinking it's fast, then switching to an e-bike and realizing you could go much faster.

Ancestral Samplers (with an "a")

These samplers have a quirk: the image doesn't converge. What does that mean? The AI will improvise, not strictly following your prompt. The generated image cannot be reproduced—even if you save the seed, the next generation will be different. I didn't know this at first. I thought I'd misremembered the seed number. I wasted hours debugging, only to find it was the sampler's fault. Imagine writing a recipe, but every time you cook it, the dish tastes different.

Official Samplers

DDIM: high quality but slow. PLMS: low quality but fast. Honestly, both are outdated now. Like Nokia phones from the feature-phone era—classic, but rarely used today.

DPM and DPM++ Series

DPM2 is more accurate than DPM, but slower. DPM++ is an improvement over DPM, with adaptive iteration steps, but very slow.

2M (2nd-order multi-step) has higher quality than 2S (2nd-order single-step), but at the cost of speed.

DPM fast: fast but lowest quality. DPM adaptive: adaptive steps, decent results but can be very slow. SDE: richer details.

Karras Series

The Karras series uses a noise schedule that improves quality to some extent. I recommend DPM++ 2M Karras—it's well-balanced. Like finding a partner: not the most handsome, not the richest, but the highest overall score.

UniPC

Newest, fastest, relatively high quality. I've been using this lately. It's just a tiny bit slower than Euler, but the quality is much better. It's like discovering a new continent—you're so excited you want to tell everyone.

About Sampling Steps

Different sampling methods require different numbers of iteration steps, but generally between 20 and 50. Beyond 50 steps, the difference is negligible—like boiling noodles too long until they get mushy. I usually set it to 30 steps, which is enough.

---

4. Other Parameters: You Think They're Unimportant, But They Have a Big Impact

Restore Faces

As the name suggests, it fixes faces. But there's a catch—it adds a beauty filter/smoothing effect. If you want realism, don't turn it on. I never use beauty filters on my photos, and I don't use them on generated images either. Think about it: real faces aren't that smooth. A little imperfection makes them real.

Tiling (Seamless Texture)

Makes texture connections smoother. But it's prone to errors. Unless you're working on design, don't turn it on. I tried it once and got a weird wallpaper

我花3年踩遍SD参数坑，从原理到实操一次说清 (English)

我花3年踩遍SD参数坑，从原理到实操一次说清 (English)

This SD Parameter Guide Took Me Three Years to Write

1. Base Models and External VAEs: You Think You've Chosen Right, But You Might Be Wasting Your Time

1.1 Basic Terms, Explained in Plain English

1.2 Differences Between Base Models

1.3 Differences Between External VAE Models

2. CLIP Skip: You Think More Layers Mean Deeper Understanding? Not Really.

3. Sampling Methods: The Part Where I Fell Into the Most Traps, No Contest

Classic ODE Solvers

Ancestral Samplers (with an "a")

Official Samplers

DPM and DPM++ Series

Karras Series

UniPC

About Sampling Steps

4. Other Parameters: You Think They're Unimportant, But They Have a Big Impact

Restore Faces

Tiling (Seamless Texture)

Cael Lee

Ready to get started?