I Built an AI Code Reviewer That Only Looks at My Git Diffs (And It's Shockingly Good)
I Built an AI Code Reviewer That Only Looks at My Git Diffs (And It's Shockingly Good)
Last week, I was digging through old commits and found something that made me laugh out loud—a 200-line function I wrote three months ago, nested 4 layers deep in if-else hell, with variable names like tempData, result2, and finalResult_v3. You know that moment when you look at your own code and think, "Who wrote this rubbish?" Yeah. That.
So I thought: what if I could get AI to clean up these "historical artefacts" automatically?
And then—well, I cobbled together a workflow using OpenAI Codex. Let me be clear: I didn't build anything elegant. I hacked it together over two evenings, and the first evening was entirely spent arguing with prompts. Today I want to share how I use Codex to intelligently refactor Git diff chunks.
Here's the thing—this approach changed how I think about AI-assisted coding. Not because it's clever (it's not), but because it's practical.
TL;DR
We're covering:
- How to make Codex actually understand your Git diffs
- Real refactoring wins and spectacular failures
- Integrating this into your daily workflow without losing your mind
- Money-saving tricks (spoiler: API calls are surprisingly cheap)
Why Obsess Over Git Diffs?
Initially, I just wanted Codex to refactor entire files. Two problems emerged immediately.
First, it's painfully slow. Throw a 1,000-line file at Codex, and the response time is long enough to brew a pour-over coffee and eat a biscuit. I timed it once—a 2,300-line Python file took 47 seconds. On my M2 MacBook Pro, that's an eternity.
Second, I didn't trust it. This is the bigger issue. When Codex refactors an entire file, I have no idea what it changed or why. What if it "optimised" away a boundary condition that handles some edge case I discovered at 2 AM six months ago?
This is tricky to explain. Let me give you a real example. Last November, a colleague used GPT-4 to refactor a payment module. It changed a <= to < in the refund logic—a tiny change that would've broken partial refunds for orders with discounts. The code review caught it, but barely. We were this close to a production incident.
Then it hit me. Actually, it hit me in the shower. Only refactor the Git diff chunks.
The advantages are obvious once you think about it:
- Changes are scoped and reviewable
- Diffs come with built-in context (what was added/removed)
- You can actually review what the AI suggests without diffing the diff
My current approach is dead simple: git diff --staged goes to Codex, it returns optimised versions, and I decide what to keep. That's it.
Hacking It Together
The workflow has three steps. Here's a rough sketch:
git diff → 提取变更片段 → 拼 prompt → 调 Codex API → 拿到建议
The core code looks something like this. I wrote it in Node.js (there's a Python version with identical logic), using the openai package v4.52:
const { execSync } = require('child_process');
const OpenAI = require('openai');
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// 1. 获取暂存区 diff
const diff = execSync('git diff --cached').toString();
// 2. diff 太长就按文件切
const files = parseDiffIntoFiles(diff);
// 3. 逐个文件调 Codex
for (const file of files) {
const prompt = `
你是资深代码审查员。对以下 git diff 进行重构优化:
要求:
- 保持原有功能不变
- 改善可读性和命名
- 简化复杂逻辑
- 如有性能问题,指出但不强制修改
- 用 diff 格式输出修改建议
- 如果遇到看似冗余但可能涉及边界条件的代码,保留原逻辑并加注释
原始 diff:
${file.diff}
`;
const response = await openai.chat.completions.create({
model: 'gpt-4-0125-preview',
messages: [{ role: 'user', content: prompt }],
max_tokens: 2000,
temperature: 0.3,
});
console.log(`📝 ${file.filename} 建议:\n`, response.choices[0].message.content);
}
Heads up: Don't feed raw git diff output directly to Codex. Those @@ -14,6 +14,8 @@ markers and --- a/file.py headers confuse the model terribly. I wasted an entire afternoon learning this—12 March, I remember because it was pouring rain in Berlin that day. You need to transform the diff into something more digestible, clearly labelling "removed lines" and "added lines".
The error I kept getting looked like this:
The model returned an invalid diff format. Expected unified diff headers.
After trawling through GitHub Issues for hours, I finally found the solution buried in an OpenAI community forum post. Transform the diff first. Trust me.
Three Real-World Cases
Case 1: The Nesting Nightmare
This is my favourite success story. Original code:
if (user) {
if (user.role === 'admin') {
if (permissions.includes('write')) {
if (resource.owner === user.id || resource.public) {
updateResource(resource, data);
}
}
}
}
Codex suggested:
const canEditResource = (user, resource, permissions) => {
if (!user || user.role !== 'admin') return false;
if (!permissions.includes('write')) return false;
return resource.owner === user.id || resource.public;
};
if (canEditResource(user, resource, permissions)) {
updateResource(resource, data);
}
Early returns + semantic function names. I accepted it wholesale. Zero modifications.
Satisfying doesn't begin to describe it.
Case 2: The Variable Naming Renaissance
I had a chunk of code littered with d, tmp, res, val. Codex renamed them to:
d→rawApiResponsetmp→parsedUserDatares→filteredActiveUsersval→userLastLoginTimestamp
Looks great, right? But userLastLoginTimestamp is absurdly long in chained method calls. I think Codex sometimes goes overboard with "semantic" naming. I ended up manually shortening it to lastLogin.
Case 3: The Spectacular Failure
This was the scariest one. Original code:
# 原始代码
if len(items) == 0 or items is None:
return default_value
Codex got "clever":
# Codex 建议
if not items:
return default_value
Looks more Pythonic. But. some_value could be None in edge cases, and that original is False check—though written backwards—accidentally provided a safety layer before len() was called. Our project has a custom NullableList class that overrides len...
After Codex's refactor, three tests failed. I remember this was December—all three failures were related to cache expiry logic.
Since then, I've added an ironclad rule to my prompt: "If you encounter code that seems redundant but might handle boundary conditions, preserve the original logic and add a comment."
Integrating Into Your Daily Workflow
I've wired this into my Git hooks now.
The flow:
- pre-commit: Run Codex on staged diffs
- Output suggestions, don't auto-apply: Print to terminal or generate an HTML report
- Manual review: Accept if it looks good, skip if you're unsure
- post-commit (optional): Double-check for anything missed
Here's a tip. Don't run this on every commit. I only trigger it on feat and refactor commit types. fix and hotfix get skipped—when you're patching bugs, the last thing you need is AI "help" introducing new ones.
For VS Code users, I wrote a simple extension. Right-click menu → "Codex Refactor Current Changes". One click. The extension isn't on the Marketplace yet, but the code's in a gist on my GitHub.
Here's the config, using lint-staged:
#!/bin/sh
if grep -q "feat\|refactor" "$1"; then
node scripts/codex-refactor.js
fi
That's literally it.
What It Costs
A lot of people worry about API fees.
I average 5-8 meaningful commits per day. Each diff is roughly 100-300 lines, and GPT-4 processes it in about 2,000-4,000 tokens.
At OpenAI's April 2024 pricing, that's roughly $15-25 per month. A pour-over coffee in Berlin costs €4. Compared to the time it saves me, this is pocket change.
If you want to save money, use gpt-3.5-turbo as a first pass. I tested this for three weeks—it cuts costs by about 40% with similar results. Reserve GPT-4 for the complex stuff.
Honest Thoughts
After using this for half a year, here's my biggest takeaway: Codex isn't here to replace you. It's here to handle the grunt work—the stuff you know how to do but can't be bothered to type out.
Renaming variables, extracting functions, simplifying conditionals—these have clear rules, and AI does them faster than I can. But business logic, boundary conditions, performance trade-offs? That still needs a human brain.
My advice in eight words: Use it boldly, review it carefully. Treat Codex suggestions like code review comments from a junior colleague—helpful, but not gospel.
From what I've seen, about a third of developers in my circle are using AI-assisted refactoring now. The 2024 State of DevOps report mentions this trend too, though I can't recall the exact figures.
Have you tried AI-powered refactoring? What's the weirdest "optimisation" suggestion you've seen? Or do you have better prompt strategies? Drop a comment—I'm collecting prompt templates and planning to write a follow-up piece.
#OpenAI #Codex #Git #Refactoring #DevTools
Cael Lee
Full-stack developer with 8+ years of experience. Currently building AI-powered developer tools. I've tested 20+ AI API providers and coding assistants.