select between over 22,900 AI Tool and 17,900 AI News Posts.
The new AI benchmark ARC-AGI-2 significantly raises the bar for AI tests. While humans can easily solve the tasks, even highly developed AI systems such as OpenAI o3 clearly fail.
The article OpenAI's top models crash from 75% to just 4% on challenging new ARC-AGI-2 test appeared first on THE DECODER.