Using multiple AI models to review each other for coding actually works

Ethan · December 15, 2025, 7:44am

I was working on an issue that touched two different repos, and honestly I wasn’t familiar with either of them.

So I put both repos into one folder and opened it in Cursor:

workspace/
├─ repo1/
├─ repo2/
├─ opus.plan.md
└─ codex.plan.md

Then I tried a slightly different AI workflow instead of relying on just one model.

I copied the issue description into Claude Code (Cursor extension) using Opus 4.5, and asked it to:

Then I switched to Codex 5.1 Max (extra high) and asked it to:

Next, I opened the same workspace in Antigravity IDE, used Gemini 3 Pro (high), and asked:

Gemini explained why and said the Codex plan was better

Finally, I used Codex again to implement the solution based on its own plan.

What I really liked about this setup:

no copy-pasting between tools
plan files live outside the repos, so no risk of accidentally committing them
cleaner mental model when working across multiple codebases

I’ve noticed that when I rely on only one model, the result is often either:

But letting multiple models review and judge each other actually worked surprisingly well.

It’s a bit slower, sure — but the quality and confidence level feel much higher.

Curious if anyone else is doing something like this