深度拆解 Superpowers 和 gstack (English)

Generated: 2026-06-21 21:49:09

---

I Almost Let AI Programming Waste an Entire Year, Then Finally Figured One Thing Out

Believe it or not, at the beginning of last year, I was an absolute fool.

The whole world was shouting "AI is going to replace programmers," and I was like a cat on a hot tin roof. Claude 4 came out? Buy it! GPT-5 updated? Sign me up. I was among the first to pay for every new version. For three months, only one thing occupied my mind: which model is stronger?

So what happened next? You want to guess?

Yeah, the code could run. But as soon as the project got even a little complex, it crashed harder than anything.

The most ridiculous case—I asked Claude to write a payment module. Imagine this: it generated 2000 lines of seemingly flawless code. I was over the moon. First day of deployment, bam, the test environment database went down. Why? Because this thing had defined its own "more efficient" transaction processing logic that completely ignored standard practices.

That's when I started to wonder about something: maybe the problem with AI programming isn't the model at all. You see, just having that thought is counterintuitive enough.

---

Speaking of which, I have to tell you about the first thing that shattered my assumptions—Superpowers.

Back in February, I stumbled across it in an obscure GitHub discussion thread. I hated the name at first sight. "Superpowers"? Some cheap superhero hype again, right? But when I clicked in, something was off.

It was completely different from other AI tools. Other tools say, "Just tell me what you want." This one, though, it enforced a strict sequence: first write a design document, then a plan, then tests, and only then the code itself. Every step had output specifications. Skip a step, and it would go on strike.

I thought to myself: Are you kidding me? I'm using AI to save time, and you want me to write documentation first?

But after testing it twice, I gave in. I really did.

I remember one example vividly. I used vanilla Claude to write a user permissions system. After two hours of write-rewrite-write-rewrite, I finally got something that barely worked. For the exact same requirement, I switched to Superpowers and went through the workflow—brainstorming, writing specs, planning, TDD, code review—which took a full four hours.

And the result? Unbelievable. The first version was shippable. Test coverage hit 92%. The API documentation was complete. It even handled edge cases I hadn't thought about at all.

See, that's the power of discipline. Superpowers has 11 built-in core skills, from brainstorming to finishing-a-development-branch—every single one grounded in battle-tested workflows. One of its founders spent eight years in a medical systems company. Think about it: a bug in medical systems can cost lives. They essentially took that "must test, must review, no skipping steps" insane culture and baked it directly into the AI workflow.

---

In March, gstack suddenly blew up.

Garry Tan—you know, the YC CEO—posted a tweet showing what his team built with this tool, captioning it "810x productivity improvement." The comment section exploded.

My first reaction was the same as yours: exaggerated. 810x? Seriously?

But curiosity is unstoppable. I dug into its code repository, and after about ten minutes, I realized gstack and Superpowers were on completely different paths.

Superpowers is like a project manager staring over your shoulder: "Follow the process. Don't cut corners. Test first, then write."

gstack is like an all-star team surrounding you: "I'm the CEO—I'll decide if this requirement is worth doing. I'm the engineering director—I'll review your architecture. I'm QA—I'll find the holes."

Its commands range from /ceo-review to /eng-review to /design-review, each backed by real role-playing. The slickest move was /codex—having two AIs cross-check each other's output while the human makes the final call.

Honestly, by this point I was already a bit dazed. Isn't the model the problem? So where did all these "workflow tools" suddenly come from?

---

Over the next two months, things got even more interesting.

First came OpenSpec. It doesn't go for personified roles. Instead, it turns requirements, specifications, designs, tasks, and archiving all into traceable "artifacts." Every time you do something, you have to update spec files, change logs, and archive status. It feels more like a version control system—except it's controlling the AI development process.

Then the Compound Engineering Plugin appeared. Its core idea is: after finishing something, leave behind reusable assets. Every round of plan, review, done—it writes all the experience into durable docs. Next time you face a similar problem, you don't start from scratch. Think about it: you build a payment system, and three months later you need a similar membership system. Compound has already captured your architectural decisions, common pitfalls, and testing strategies. That's kind of a big deal.

By this point, I had completely given up on the "which model is best" question. Why? Because the facts were right there:

Using the same Claude 3.5 or 4, different tools produced wildly different code quality.

I ran a dedicated test: a simple CRUD interface, each run three times with vanilla Claude, Superpowers, gstack, and OpenSpec. Vanilla Claude generated three different coding styles in three runs—once it even invented a non-existent database function. Superpowers and gstack produced consistent code styles, but gstack was more flexible, letting me quickly skip unnecessary steps.

What surprised me most was OpenSpec: its output was the slowest, but its change traceability was genuinely strong. Two weeks later, I looked back at an old spec file and could clearly see why a particular design decision was made at the time.

In August, Lucas Fernandes published a comparison article with a line I still remember: "Superpowers' primary control primitive is skill and discipline. gstack's primary control primitive is role commands and sprint flow. OpenSpec's primary control primitive is spec/change/archive artifacts." I mulled that over for days before realizing it had actually hit the core of the entire problem.

---

In July, a concept called "Harness Engineering" suddenly emerged.

An article about the underlying infrastructure for AI agents was circulating in a niche circle. It proposed a three-layer framework:

Prompt Engineering: teaching AI what to say
Skill Engineering: teaching AI what to do
Harness Engineering: providing a safe runtime environment for AI

I was already struggling with workflow tools—not because they were bad, but because I didn't know how to combine them optimally. When I saw this framework, something clicked in my head. Bright and clear. What was the problem? Our previous mindset was all "find the best tool," but what we really needed was "build the workflow best suited for your project."

Garry Tan said something in an interview that I deeply agree with: future programmers will only have three roles left—architect, product manager, and quality gatekeeper. It's not that you'll stop writing code; it's that more of your time will be spent on decisions and reviews rather than typing keys.

From that perspective, an 810x productivity gain makes sense: if one person becomes a "manager + reviewer," of course they can oversee multiple AIs working at once.

---

By August, someone finally published a thorough side-by-side comparison of Superpowers, gstack, OpenSpec, and Compound. I stayed up reading it until 3 AM because each project was broken down to its fundamental logic.

The article had a table I think is worth copying out for you:

Superpowers: a strongly constrained development discipline system. Core idea: "must design first, then plan, then execute, then review; must test, must review"
OpenSpec: a specification and change artifact system. Core idea: "turn requirements, specs, designs, tasks, and archives into traceable artifacts"
gstack: a virtual engineering team operating system. Core idea: "multi-role collaboration + cross-model cross-review"
Compound: an engineering knowledge compounding system. Core idea: "not just do things, but write results back into durable docs"

All four projects target the same problem—getting AI to work within a controlled process—but they approach it from completely different angles.

---

In November, I finally couldn't hold back and decided to test combinations myself.

First, I ran Superpowers solo on a personal project. It was a small SaaS backend with about 20 API endpoints. I strictly followed the process: brainstorming produced the design doc, writing-plans broke down tasks, executing-plans used TDD to write code. Every step was rigid—at one point it wouldn't even let me directly modify code.

The experience? Like working with an extremely strict senior mentor. Slow, yes. But rock

深度拆解 Superpowers 和 gstack (English)

深度拆解 Superpowers 和 gstack (English)

I Almost Let AI Programming Waste an Entire Year, Then Finally Figured One Thing Out

Cael Lee

Ready to get started?