Settings

Theme

AI Code Review Gets Better When I Ask Models to Debate: Claude, Gemini, Codex

milvus.io

23 points by Fendy 22 days ago · 4 comments

Reader

7777777phil 22 days ago

Makes sense. This also tracks with the research on human-AI collaboration. A single model converges to the mean of its training distribution, but adversarial multi-model setups break that pattern because each model's blind spots are different.

I wrote about why single-model AI has a structural quality ceiling and why ensemble/hybrid approaches consistently outperform: https://philippdubach.com/posts/the-impossible-backhand/

itmitica 22 days ago

I did the exact same thing! Uncanny.

I agree with models being better at different tasks: gemini-cli is superficial, codex is stubborn as a mule and dependable, claude-cli just wants to get something working and done. qwen-cli, Qwen, in general, has a tendency to pendulate too much.

I also reduced the team to two, codex and claude, for me.

  • rbliss 22 days ago

    Agree with this. I have Codex do analysis and feedback for Claude code. For whatever reason, Claude code seems to produce successful code more frequently, but it tends to have blind spots in performing analysis that Codex does a good job of picking up. The two together feel like a step up in state of the art.

    I need a tool to put them in a loop together to get this done more efficiently…I guess I’ll plug this in as a prompt and go from there!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection