Ai Benchmark - Search News

Gemini 3 Flash Crushes ChatGPT-5.2 in Accuracy Test – ORCA Benchmark Update

New ORCA results show Gemini leading in practical math, but no AI matches the consistency of a simple calculator.

“Humanity’s Last Exam”: The Super-Benchmark AI Is Currently Failing

Researchers debut "Humanity’s Last Exam," a benchmark of 2,500 expert-level questions that current AI models are failing.

Decrypt

OpenAI Says Benchmark Used to Measure AI Coding Skill Is 'Contaminated'—Here's Why

OpenAI wants to retire the leading AI coding benchmark—and the reasons reveal a deeper problem with how the whole industry measures itself.

Morningstar

Sup AI Sets New Benchmark Record with 52.15% on Humanity's Last Exam

Important Disclosure: This is an independent evaluation conducted by Sup AI and is not officially endorsed, validated, or recognized by the Center for AI Safety, Scale AI, or the HLE benchmark ...

The National Law Review

ORCA Benchmark Shows That AI Frequently Fumbles Everyday Math

KRAKóW, MAłOPOLSKA, POLAND, November 7, 2025 /EINPresswire.com/ -- Omni Calculator has introduced the ORCA (Omni Research on Calculation in AI) Benchmark - a new ...

Hosted on MSN

Sony has a new benchmark for ethical AI

Sony AI released a dataset that tests the fairness and bias of AI models. It's called the Fair Human-Centric Image Benchmark (FHIBE, pronounced like "Phoebe"). The company describes it as the "first ...

6dOpinion

India's AI Sovereignty Needs A Scoreboard, Not Just A Model

Every Indian AI model is graded on benchmarks built in San Francisco. GPT-5 scores below 40% on Indian cultural reasoning.

Hosted on MSN

A new AI benchmark tests whether chatbots protect human wellbeing

AI chatbots have been linked to serious mental health harms in heavy users, but there have been few standards for measuring whether they safeguard human wellbeing or just maximize for engagement. A ...

10d

Backboard.io Becomes First AI Platform to Lead Both Major Memory Benchmarks, Accelerating the Era of Agentic AI

Backboard.io announced it has achieved state-of-the-art performance across both leading AI memory benchmarks, a first ...

Geeky Gadgets

Al Benchmarks Investigated : Do Companies Tune Private Builds for Leaderboards, Then Ship Weaker Versions?

Are AI benchmarks really the gold standard we’ve been led to believe? Matt Wolfe walks through how these widely accepted metrics, designed to measure the performance of artificial intelligence systems ...

Etna Capital Management Releases “Beyond Benchmarks” Framework on Research-Led AI Investing in a Global Automation Era

Hong Kong, China - February 26, 2026 - Etna Capital Management today released a new research framework, “Beyond ...

Morningstar

Shanghai's "AI + Manufacturing" Benchmark Gains Momentum as SIXUNITED Sets Its Sights on the Hundred-Billion-Yuan Milestone

SHANGHAI, CN / ACCESS Newswire / February 9, 2026 / On February 8, 2026, at a pivotal moment for the advancement of Shanghai's "Empowering Manufacturing with AI" strategy and the deep integration of ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results