# 5 ais predict arcagi3 Published: 2026-03-27 URL: https://diegoprime.com/blog/5-ais-predict-arcagi3 [arc-agi-3](https://arcprize.org/arc-agi/3) launched march 25. every frontier llm scored below 0.4%. one day later, anthropic's next model ["claude mythos" leaked](https://m1astra-mythos.pages.dev/). i gave the same prompt to 5 ai deep research tools: predict what mythos will score. ### the prompt > What will Claude Mythos score on ARC-AGI-3? Produce a rigorous probabilistic prediction. Search extensively. Reason from first principles. Take as long as you need. same prompt, no follow-ups, let each one run independently. ### predictions | ai | expected value | median | | ---------------------- | -------------- | ------ | | cc opus 4.6 (terminal) | 4.2% | 0.7% | | sonnet 4.6 extended | ~2% | ~0.3% | | opus 4.6 extended | — | — | | gemini deep research | 3.15% | ~2-3% | | chatgpt deep research | 0.97% | 0.35% | opus extended wrote a 10k word article instead of a forecast. same conclusion, no single number. all five agreed: mythos improves things llms are already good at. arc-agi-3 measures something llms are structurally bad at. expected values ranged from 0.97% to 4.2%, medians clustered around 0.3% to 0.7%. all five found the same key evidence independently (the duke harness paradox, stochasticgoose beating every llm by 34x, RHAE-squared killing marginal gains). ### how each one reasoned **cc opus** in terminal was the most efficient. same insight as the 10k word reports in ~800 words. **sonnet** was the cleanest forecaster. tight probability buckets, calibrated confidence. **opus** went journalist mode. market impact analysis, architectural recommendations, 317 sources in 9 minutes. **gemini** wrote an academic paper. invented the term "modality collapse." most detailed analysis of the leak itself. **chatgpt** was the most conservative. nine probability buckets, lowest EV (0.97%), lowest self-assessed confidence (30%). same evidence, same conclusion, completely different outputs. i'll update this when actual scores drop. ## Connections - [[Software, AI & Building]] - [[density-of-intelligence]]