select between over 22,900 AI Tool and 17,900 AI News Posts.
BIG-Bench, developed in 2021 as a universal benchmark for testing large language models, has reached its limits as current models achieve over 90% accuracy. In response, Google DeepMind has introduced BIG-Bench Extra Hard (BBEH), which reveals substantial weaknesses even in the most advanced AI models.
The article OpenAI beats Deepseek by a surprisingly wide margin in Google's latest reasoning benchmark appeared first on THE DECODER.