Insights · March 30th, 2026
François Chollet just released ARC-AGI-3 — the hardest AI test ever created. 135 novel game environments. No instructions. No rules. No goals given. AI Platforms failed miserably.
Each environment was handcrafted by game designers. The AI gets dropped in and has to explore, discover what winning looks like, and adapt in real time. The scoring punishes brute force. If a human needs 10 actions and the AI needs 100, the AI doesn’t get 10%. It gets 1%. You can’t throw more compute at this.
For context: ARC-AGI-1 is basically solved. Gemini scores 98% on it. ARC-AGI-2 went from 3% to 77% in under a year. Labs spent millions training on earlier versions. ARC-AGI-3 resets the entire scoreboard to near zero.
AI platforms need to figure it out or fail.
The headline finding is stark: humans solved 100% of the environments, while frontier AI systems scored below 1% as of March 2026. The benchmark authors argue this exposes a major gap between current model performance on familiar, language-heavy benchmarks and true adaptive, agentic intelligence in novel situations
- Gemini 3.1 Pro: 0.37%
- GPT 5.4: 0.26%
- Opus 4.6: 0.25%
- Grok-4.20: 0.00%
The benchmark authors argue this exposes a major gap between current model performance on familiar, language-heavy benchmarks and true adaptive, agentic intelligence in novel situations.
What makes this benchmark important is that it is not measuring whether a model can retrieve learned patterns from training data. It is measuring whether the system can explore, infer hidden goals, model dynamics, and plan in a novel interactive setting with no explicit instructions. The environments are also intentionally limited to simple “core knowledge” priors such as objects, geometry, basic physics, and agency, rather than language or domain knowledge.
Scaling alone will not close this gap. We are nowhere near AGI.
What this means for CEOs
The core business implication is that today’s leading AI systems are still much stronger at fluent output than at robust autonomous adaptation. They can look impressive in chat, coding, summarization, and narrow workflows, but this paper suggests they still struggle when dropped into genuinely unfamiliar operating conditions where they must discover objectives and rules for themselves.
That matters because many executive narratives around “AI agents” assume near-term readiness for broad autonomy. This paper is a reminder that agentic reliability remains immature. In enterprise settings, that means AI is best treated today as a copilot, accelerator, or bounded workflow component rather than a universally dependable autonomous operator. That is an inference from the benchmark’s findings, rather than a direct claim of the paper.
A second implication is strategic: the frontier may be shifting from “bigger models” toward better architectures for exploration, memory, world-modeling, planning, and tool use. ARC-AGI-3 is effectively saying that the next leap in AI value may not come from more eloquent language alone, but from systems that can learn efficiently in unfamiliar environments.
A third implication is governance-related. The benchmark is designed to resist the usual benchmark inflation problem by emphasizing novelty, hidden private environments, and efficiency against human baselines. That makes it useful as a caution against overreading vendor demos or headline benchmark wins. For leadership teams, the lesson is to ask not just, “Can the model do this task?” but also, “Can it do it reliably, efficiently, and without heavy human scaffolding?”
Download and read the full ‘ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence’ paper – here
Read more on the ARC Prize – here
About Nikolas Badminton
Nikolas Badminton is the Chief Futurist & Hope Engineer at futurist.com. He’s a world-renowned futurist keynote speaker, consultant, author, media producer, and executive advisor that has spoken to, and worked with, over 500 of the world’s most impactful organizations and governments.
Nikolas is an artificial intelligence expert and his 2026 keynote ‘The AI Leader: Create Incredible Productivity, Profit & Growth’ is the level up for the modern CEO and executive leader.
Please contact futurist speaker and consultant Nikolas Badminton to discuss your engagement.