I compared my daughter against SOTA models on math puzzles
blog.michalprzadka.comI tested o3, r1, 4o and other SOTA models against puzzles from an international math competition and compared their performance with my 11-year-old daughter's solutions. Full results include detailed conversations with each model and complete methodology.
Interesting how the reasoning differs between models, e.g. DeepSeek trying the brute force tricks
Very cool post! I wonder how much will it affect the psychology of next generations.