Page 8 of 19
Last Friday night, I was coding in a café when a guy in a plaid shirt sitting next to me was losing his mind staring at his
It took me three months to finally understand what distributed training of large models is all about. Let me tell you a sto
Last month, a buddy from a startup called me late at night, his voice almost in tears: "We're using the most advanced infer
Guess what? I've been losing sleep over a model recently. Since last year, I've been quietly watching the folks at Kimi. K2
You must have heard someone say: "KV cache? It's just a cache—what's there to talk about?" I’d bet ten to one that whoever
Last year, I was full of confidence—I got my hands on the original LLaMA-7B and wanted to play around with Chinese instruct
Alright, leave it to me! I'm going to turn this article inside out and breathe a brand new soul into it. --- Last month, I
A couple of days ago, a friend came to me, saying he was trying to figure out how to turn a pre-trained model into a real a
To be honest, a few days ago I did something particularly foolish—I dug out my old GPU with only 8GB of VRAM and tried to r
You know what? Just half a year ago, I was absolutely fuming at a model that could only chat. I said to it: "Can you help m
Let me start with a true story. Two years ago, I was working on inference services, back when the A100 was still the hot co
Have you ever met someone like this? Their resume says "Proficient in LoRA fine-tuning," but when you dig into it, all they