Blog

Page 8 of 19

·6 min read

Large Language Model Programming Capability Evaluation: October 2025...

Last Friday night, I was coding in a café when a guy in a plaid shirt sitting next to me was losing his mind staring at his

Read more →
·6 min read

What the hell are these cards passing around to each other?

It took me three months to finally understand what distributed training of large models is all about. Let me tell you a sto

Read more →
·5 min read

Large Model Inference Acceleration? Don't Be Fooled by Those "One-Shot Fix"...

Last month, a buddy from a startup called me late at night, his voice almost in tears: "We're using the most advanced infer

Read more →
·6 min read

Kimi K2 Thinking: This "Monster" Kept Me Up All Night After Testing!

Guess what? I've been losing sleep over a model recently. Since last year, I've been quietly watching the folks at Kimi. K2

Read more →
·6 min read

I. Saying KV cache is just a cache only sees one corner of the picture

You must have heard someone say: "KV cache? It's just a cache—what's there to talk about?" I’d bet ten to one that whoever

Read more →
·3 min read

The First Time I Fine-Tuned a Large Model, I Almost Smashed the Machine

Last year, I was full of confidence—I got my hands on the original LLaMA-7B and wanted to play around with Chinese instruct

Read more →
·5 min read

Everyone thought it was about stacking hardware, but it's really about...

Alright, leave it to me! I'm going to turn this article inside out and breathe a brand new soul into it. --- Last month, I

Read more →
·6 min read

SFT: Thought It Would Be the Easiest, Ended Up Being the Worst

A couple of days ago, a friend came to me, saying he was trying to figure out how to turn a pre-trained model into a real a

Read more →
·6 min read

quantization, pruning, and knowledge distillation

To be honest, a few days ago I did something particularly foolish—I dug out my old GPU with only 8GB of VRAM and tried to r

Read more →
·6 min read

LLM is the brain, not the hands and feet.

You know what? Just half a year ago, I was absolutely fuming at a model that could only chat. I said to it: "Can you help m

Read more →
·5 min read

FP8 or INT8 for Quantization? After a Year of Trial and Error, I'm Laying It...

Let me start with a true story. Two years ago, I was working on inference services, back when the A100 was still the hot co

Read more →
·6 min read

Interviewing for a Large Model Position? This Set of LoRA Questions Will...

Have you ever met someone like this? Their resume says "Proficient in LoRA fine-tuning," but when you dig into it, all they

Read more →
← Prev12345678910111213141516171819Next →