I Fed a 47-File Python Disaster to GPT-5.1-Codex-Max at 3 AM—Here's What Happened

Last Tuesday at 3 AM, I was staring at a codebase that made me question my career choices.

Not because it was complex. Because it was a 47-file Python monolith cosplaying as microservices. You know the type—import utils.py and suddenly your LSP crashes from circular dependencies before Docker even finishes building. My usual bag of tricks (AST grepping, pydeps graphs, truly irresponsible amounts of cold brew) had abandoned me entirely.

Then I remembered the GPT-5.1-Codex-Max preview. OpenAI quietly dropped it in their April 2025 release cycle, and I'd been meaning to properly stress-test it. So I pointed it at this exact codebase.

It didn't melt down.

The git diff stats made our senior architect physically walk over to my desk and ask what the hell I'd been doing all night. I'll walk you through everything—cross-file semantic analysis that actually worked, a full refactoring plan that didn't break anything, and it even generated the CDK infrastructure I'd been putting off for weeks. I'll paste the real terminal output below. And yes, there's a moment where it probably saved me from a 4 AM PagerDuty incident. Those are the worst kind.

If you're currently drowning in CloudFormation or untangling some legacy backend spaghetti, this is for you. I'm a DevOps person by trade, but I end up in backend code more often than I'd like.

TL;DR

GPT-5.1-Codex-Max actually understands cross-file dependencies before generating code—it builds a dependency tree first
It successfully broke circular imports in my 32-file FastAPI project on the first attempt (all 742 unit tests passed)
Migrated a 12-file Flask app to FastAPI with zero manual fixes—GPT-4o and Claude 3.5 Sonnet both needed corrections
Generated a complete AWS CDK TypeScript project that synthesised correctly on the first cdk synth
It's not magic—it chokes on binary files, monorepos over 500 files, and merge conflicts with live changes

What You'll Need to Follow Along

Look, I don't want to waste your time. Here's the exact setup I used:

Access to GPT-5.1-Codex-Max—I got mine through Azure OpenAI Service around 15 April 2025. You might have it directly on platform.openai.com by now
Python 3.12+ with openai SDK v2.3.0
A multi-file project. I'll reference my e-commerce microservices test repo (link in Further Reading if you want to clone it)
AWS CDK v2.160.0 and TypeScript 5.6 for the infrastructure bits
pydeps and graphviz if you want the pretty dependency graphs (optional, but genuinely helpful)


pip install openai==2.3.0 pydeps graphviz

Actually, wait—I should clarify. The openai SDK v2.3.0 specifically has the codex upload subcommand. Earlier versions don't, I think. I wasted 20 minutes fighting with v2.1 before I realised that. Learn from my mistakes.

How GPT-5.1-Codex-Max Actually Handles Multiple Files

GPT-4o had a 128K token limit, and honestly? It got a bit lost with multi-file projects. The new model does something fundamentally different. From what I've observed and what the docs hint at, it uses what they call a hierarchical context window. Basically, it works out which files matter based on your import graph before it generates anything.

Here's what I think is happening under the hood:

Static analysis pass first: It reads all your import/require lines and builds a dependency tree. Not during generation—beforehand
Semantic chunking: This is the clever bit. Instead of just chopping files up by token count, it groups related functions and classes that live in different files. So auth_service.py and utils.py get analysed together if they're tangled up
Refactoring-aware attention: When you ask for project-wide changes, it weights cross-file symbol definitions higher. That's the secret sauce, I reckon

I tested this with a 32-file FastAPI backend that had some genuinely cursed service layer imports. Here's what the dependency hell looked like:


graph TD
 A[auth_service.py] --> B[utils.py]
 C[order_service.py] --> B
 D[payment_service.py] --> B
 B --> A
 B --> C
 E[database.py] --> A
 E --> C
 E --> D

See that auth_service.py ↔ utils.py loop? Runtime import errors every single deployment. GPT-5.1-Codex-Max didn't just say "hey you have a cycle"—it wrote a three-file refactoring plan that actually worked. More on that next.

Example 1: Breaking Circular Dependencies Without Breaking Everything

Here's exactly what I did at 3 AM, complete with terminal output.

Step 1: Dump the Whole Project

New CLI command they added in March 2025. You can just upload your entire project tree:


openai codex upload ./ecommerce-backend --model gpt-5.1-codex-max --context-mode full-deps

That --context-mode full-deps flag is what triggers the static analysis. Terminal output:


Uploading 32 files (14,230 lines of Python)...
Building dependency graph... Done.
Resolved 47 cross-file symbols.
Context window utilisation: 68% (87,040/128,000 tokens)

Step 2: Actually Ask It to Fix Things

I kept the prompt dead simple on purpose:


Analyse the dependency graph for circular imports. Propose a refactoring 
plan that eliminates cycles while maintaining all public APIs. Generate 
the complete new file structure with code.

No fancy prompt engineering. Just... ask.

Step 3: What It Gave Me

It came back with a structured plan. Not just "here's the problem" but actual files:

Spotted the cycle: authservice.py and utils.py were importing from each other (specifically hashpassword from one and verify_token from the other)
Proposed a new file: base_utils.py with the shared stuff extracted
Generated diffs for 8 files that needed import updates

Here's a chunk of the base_utils.py it wrote:


# base_utils.py (extracted from utils.py and auth_service.py)
from __future__ import annotations
import hashlib
from typing import Optional
from .database import get_db_session # No circular dependency

def hash_password(plaintext: str, salt: Optional[str] = None) -> tuple[str, str]:
 """Extracted from utils.py; used by auth_service and user_service."""
 if salt is None:
 salt = hashlib.sha256(os.urandom(60)).hexdigest()
 hashed = hashlib.pbkdf2_hmac('sha256', plaintext.encode(), salt.encode(), 100000)
 return hashed.hex(), salt

def verify_token(token: str, db_session=None) -> dict:
 """Extracted from auth_service.py; breaks cycle with utils."""
 # Implementation moved here
 ...

It also rewrote the imports in utils.py to pull from baseutils instead of authservice. Cycle gone. My 742 unit tests? All green. First attempt.

I actually laughed out loud. It was 3:15 AM and I'd been mentally preparing for a multi-hour slog.

Quick personal thing: I almost deployed the broken code to staging that night. Was too exhausted to refactor manually and thought "eh, the tests pass locally" (they didn't test the circular import scenario—rookie mistake). That would've caused 500 errors on /login right during APAC peak hours. The PagerDuty alert would've hit at 4 AM. I know because it's happened before. Twice. This model literally saved my sleep.

Well. That's a bit dramatic. But you get what I mean.

Example 2: Moving a Whole Flask App to FastAPI

After the circular import thing worked, I got ambitious. 12-file Flask REST API. Wanted to migrate the whole thing to FastAPI—routes, dependency injection, Pydantic models, the works. This isn't regex find-and-replace territory. You have to understand how request context flows across files.

What I Asked


Migrate this Flask project to FastAPI. Requirements:
1. Convert all @app.route decorators to FastAPI router syntax
2. Replace Flask-SQLAlchemy with SQLAlchemy 2.0 async sessions
3. Generate Pydantic v2 models for all request/response schemas
4. Maintain existing error handling patterns
5. Update requirements.txt and Dockerfile

How Different Models Performed

I ran the exact same prompt on three models. Here's how many files compiled without me touching them:

Model	Files Migrated Correctly (out of 12)	Manual Fixes Needed

GPT-4o	7	23 lines across 5 files

Claude 3.5 Sonnet	9	11 lines across 3 files

The thing that tripped up the others: auth_middleware.py was importing Flask's global request object. GPT-5.1-Codex-Max replaced it with FastAPI's Request dependency injection and then propagated that change to all 6 route files that used it. Claude missed two of those files. GPT-4o missed four.

Here's one of the migrated route files it generated:


# users.py (migrated from Flask to FastAPI)
from fastapi import APIRouter, Depends, HTTPException, status
from sqlalchemy.ext.asyncio import AsyncSession
from .schemas import UserCreate, UserResponse # Pydantic v2 models
from .dependencies import get_db, get_current_user
from .crud import create_user, get_user_by_id

router = APIRouter(prefix="/users", tags=["users"])

@router.post("/", response_model=UserResponse, status_code=status.HTTP_201_CREATED)
async def create_new_user(
 user_data: UserCreate,
 db: AsyncSession = Depends(get_db)
):
 existing = await get_user_by_id(db, user_data.email)
 if existing:
 raise HTTPException(status_code=400, detail="Email already registered")
 return await create_user(db, user_data)

Clean. Actually idiomatic FastAPI. Not the weird half-Flask patterns I've seen from other models.

Example 3: Generating AWS CDK That Actually Synthesises

I'm AWS certified—Solutions Architect, DevOps Engineer. Yeah, I collected them. So I had to see if it could handle infrastructure. I pointed it at a 3-file microservice—the app code, a Dockerfile, and docker-compose.yml—and asked:


Generate AWS CDK v2.160.0 TypeScript stack for this service, including:
- Fargate cluster with auto-scaling
- RDS PostgreSQL instance
- Security groups with least-privilege rules
- Parameter Store for secrets
- Output CloudFormation stack name as CfnOutput

It didn't just dump a single CDK file. It scaffolded a whole project:


infra/
├── bin/
│ └── infra.ts # Entry point
├── lib/
│ ├── compute-stack.ts # Fargate service
│ ├── database-stack.ts # RDS instance
│ └── security-stack.ts # Security groups
├── package.json
├── cdk.json
└── tsconfig.json

The cross-stack references were actually correct. In compute-stack.ts, it pulled the security group from security-stack.ts properly:


// lib/compute-stack.ts (generated by GPT-5.1-Codex-Max)
import * as cdk from 'aws-cdk-lib';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import { Construct } from 'constructs';

interface ComputeStackProps extends cdk.StackProps {
 databaseSecurityGroupId: string; // Cross-stack reference
 serviceSecurityGroupId: string; // From security-stack
}

export class ComputeStack extends cdk.Stack {
 constructor(scope: Construct, id: string, props: ComputeStackProps) {
 super(scope, id, props);

 const cluster = new ecs.Cluster(this, 'ServiceCluster', {
 vpc: ec2.Vpc.fromLookup(this, 'Vpc', { isDefault: true }),
 });

 const taskDefinition = new ecs.FargateTaskDefinition(this, 'TaskDef', {
 memoryLimitMiB: 512,
 cpu: 256,
 });

 // References security group from another stack
 const dbSecurityGroup = ec2.SecurityGroup.fromSecurityGroupId(
 this, 'DbSG', props.databaseSecurityGroupId
 );
 
 taskDefinition.addContainer('AppContainer', {
 image: ecs.ContainerImage.fromAsset('../app'),
 memoryLimitMiB: 512,
 environment: {
 DB_HOST: cdk.Fn.importValue('DatabaseEndpoint'), // Cross-stack output
 },
 });
 }
}

It even set the CDK dependency version correctly in package.json (v2.160.0 exactly). I ran cdk synth and got a valid CloudFormation template. No manual fixes.

I'll be honest—I was slightly annoyed. I'd been planning to write all that CDK myself as a "learning exercise." The model did it in about 45 seconds.

How I Actually Use This Now

After that 3 AM session, I've worked it into my daily flow. VS Code task + the OpenAI CLI:


// .vscode/tasks.json
{
 "version": "2.0.0",
 "tasks": [
 {
 "label": "Codex: Analyse Project Dependencies",
 "type": "shell",
 "command": "openai codex upload ${workspaceFolder} --model gpt-5.1-codex-max --context-mode full-deps --output analysis.md",
 "group": "build",
 "presentation": {
 "reveal": "always",
 "panel": "dedicated"
 }
 }
 ]
}

I run this before any big refactoring session now. It generates an analysis.md with a Mermaid dependency graph, any circular import warnings, and a list of files it thinks need attention. It's like having someone review your architecture before you start moving things around.

Not a replacement for actual code review. But a really solid first pass.

Where It Falls Over

It's not magic. Here's what I've bumped into:

Binary files: It can't parse compiled stuff like .so or .dll files. If you've got C extensions, you need to feed it the headers separately. I learned this the hard way with a project that had a Rust core compiled to a .so
Context window limits: 128K tokens sounds huge until you throw a 500+ file monorepo at it. It'll overflow. I've been working around this by analysing subdirectories one at a time. Clunky but works
Merge conflicts with live changes: If your team is actively refactoring while you run analysis, the suggestions can clash with in-flight PRs. I now run it on a fresh branch from main. Probably obvious in retrospect

There are probably more edge cases I haven't hit yet. Monorepos with mixed Python and TypeScript get weird. I'm still experimenting.

Your Turn

So I've shown you what happened when I threw my 47-file mess at GPT-5.1-Codex-Max, plus the FastAPI migration and CDK generation. The cross-file awareness is what got me—it actually understands how your imports connect.

But every codebase has its own weirdness.

Have any of you tried this on your own projects yet? Did it find something your team missed? Or did it confidently suggest a refactor that blew up your build? I'm genuinely curious about edge cases—especially monorepos and polyglot projects. Drop a comment. I read them all.

If this was useful, I've got a newsletter where I post this kind of deep-dive testing stuff. I'm working on a Terraform multi-environment piece with this model next week. It'll probably be messy. Those always are.

Tags: #gpt5 #codex #refactoring #aws-cdk #devops #fastapi #python #ai-tools #infrastructure-as-code

GPT-5.1-Codex-Max	12	0

I Fed a 47-File Python Disaster to GPT-5.1-Codex-Max at 3 AM—Here's What Happened

I Fed a 47-File Python Disaster to GPT-5.1-Codex-Max at 3 AM—Here's What Happened

TL;DR

What You'll Need to Follow Along

How GPT-5.1-Codex-Max Actually Handles Multiple Files

Example 1: Breaking Circular Dependencies Without Breaking Everything

Step 1: Dump the Whole Project

Step 2: Actually Ask It to Fix Things

Step 3: What It Gave Me

Example 2: Moving a Whole Flask App to FastAPI

What I Asked

How Different Models Performed

Example 3: Generating AWS CDK That Actually Synthesises

How I Actually Use This Now

Where It Falls Over

More Stuff to Read

Your Turn

Cael Lee

Ready to get started?