Using multiple AI models to review each other for coding actually works

I was working on an issue that touched two different repos, and honestly I wasn’t familiar with either of them.

So I put both repos into one folder and opened it in Cursor:

workspace/
├─ repo1/
├─ repo2/
├─ opus.plan.md
└─ codex.plan.md

Then I tried a slightly different AI workflow instead of relying on just one model.


Step 1

I copied the issue description into Claude Code (Cursor extension) using Opus 4.5, and asked it to:

  • explain the issue
  • generate a comprehensive, detailed plan in ./opus.plan.md

Step 2

Then I switched to Codex 5.1 Max (extra high) and asked it to:

  • review ./opus.plan.md
  • give its opinion
  • generate its own plan in ./codex.plan.md.

Step 3

Next, I opened the same workspace in Antigravity IDE, used Gemini 3 Pro (high), and asked:

  • which plan is better and why?

Gemini explained why and said the Codex plan was better


Step 4

Finally, I used Codex again to implement the solution based on its own plan.


What I really liked about this setup:

  • no copy-pasting between tools
  • plan files live outside the repos, so no risk of accidentally committing them
  • cleaner mental model when working across multiple codebases

I’ve noticed that when I rely on only one model, the result is often either:

  • not fully completed
  • or way over-engineered

But letting multiple models review and judge each other actually worked surprisingly well.

It’s a bit slower, sure — but the quality and confidence level feel much higher.

Curious if anyone else is doing something like this :eyes: