New ORCA results show Gemini leading in practical math, but no AI matches the consistency of a simple calculator.
Researchers debut "Humanity’s Last Exam," a benchmark of 2,500 expert-level questions that current AI models are failing.
OpenAI wants to retire the leading AI coding benchmark—and the reasons reveal a deeper problem with how the whole industry measures itself.
Important Disclosure: This is an independent evaluation conducted by Sup AI and is not officially endorsed, validated, or recognized by the Center for AI Safety, Scale AI, or the HLE benchmark ...
KRAKóW, MAłOPOLSKA, POLAND, November 7, 2025 /EINPresswire.com/ -- Omni Calculator has introduced the ORCA (Omni Research on Calculation in AI) Benchmark - a new ...
Sony AI released a dataset that tests the fairness and bias of AI models. It's called the Fair Human-Centric Image Benchmark (FHIBE, pronounced like "Phoebe"). The company describes it as the "first ...
Every Indian AI model is graded on benchmarks built in San Francisco. GPT-5 scores below 40% on Indian cultural reasoning.
AI chatbots have been linked to serious mental health harms in heavy users, but there have been few standards for measuring whether they safeguard human wellbeing or just maximize for engagement. A ...
Backboard.io announced it has achieved state-of-the-art performance across both leading AI memory benchmarks, a first ...
Are AI benchmarks really the gold standard we’ve been led to believe? Matt Wolfe walks through how these widely accepted metrics, designed to measure the performance of artificial intelligence systems ...
Hong Kong, China - February 26, 2026 - Etna Capital Management today released a new research framework, “Beyond ...
SHANGHAI, CN / ACCESS Newswire / February 9, 2026 / On February 8, 2026, at a pivotal moment for the advancement of Shanghai's "Empowering Manufacturing with AI" strategy and the deep integration of ...