Blog

Page 13 of 19

·5 min read

I Spent Three Weeks Chasing This "AI Emergence" Hype—And It Blew My Mind

Alright, I've read through the entire piece, checked the facts and figures—no major issues (parameter sizes and model capab

Read more →
·4 min read

Which InstructGPT are you reproducing?

I've been hitting every possible pitfall with RLHF since last year, and today I'm spilling it all for you. You know what? T

Read more →
·1 min read

You Think KV Cache Sparsity Is a Silver Bullet? After Two Years in the...

Okay, let me read through the whole thing and strip out those AI-tinged expressions you mentioned. I'll keep the data that'

Read more →
·4 min read

You think training multimodal LLMs is all about connectors? I burned through...

Last winter, I was sitting alone in the server room, staring at a stalled loss curve on the screen. Next to me, 32 A100s we

Read more →
·6 min read

I. Rather Lie Than Shut Up? Because It's Mathematically the Optimal Strategy

Oh my god, I'm so tired of being asked about this — but honestly, it's the puzzle I can't stop obsessing over. Let me start

Read more →
·6 min read

Stop Believing the Lie That "Only Big Tech Can Do Pre-training"

Let me tell you a story — the pitfalls I fell into. Last year, I took on a project in the medical vertical. The client's re

Read more →
·3 min read

First, a naked run: what happens when you add nothing?

You may not believe it, but my hands were shaking when I was flipping through my notes — there were more than twenty record

Read more →
·5 min read

Higher Dimensions

Hey, you know what? The first time I ran into the trouble with positional encoding was during a text generation experiment

Read more →
·6 min read

Step 1: Collecting Data

You see, the first time I encountered LoRA, I was totally lost. My mind was stuck on just one question: what’s the real dif

Read more →
·6 min read

Section 1: What's the Real Difference Among Those "Soft Prompt" Siblings?

Speaking of this, I have to first tell you about the joke I made last year. At the time, I had a task on my hands: getting

Read more →
·6 min read

A Pandemonium of Experts: What the Hell Is an MoE Large Model?

To be honest, the impulse to write this came from the embarrassment of being stumped. The other day, a friend suddenly aske

Read more →
·4 min read

The technical barrier? You think it’s just calling an API?

**The Truth No One Dares to Speak: Why 3A Games Are Afraid to Give NPCs an AI Brain** Last Friday night, I stared at my Ste

Read more →
← Prev12345678910111213141516171819Next →