Back
NVIDIAJune 12, 20262 sources

NVIDIA Blackwell tops first AgentPerf agentic-AI benchmark, 20x more agents per megawatt

AI Analysis

AgentPerf, from Artificial Analysis, is pitched as the first standardized way to measure agentic AI infrastructure performance — a meaningful shift as the dominant workload moves from one-shot inference to fleets of long-running agents. In the first published round, NVIDIA's Blackwell Ultra NVL72 delivered leading results and claimed 20x more agents per megawatt than prior NVIDIA platforms, with NVIDIA also touting leading agentic-coding performance.

The metric of 'agents per megawatt' is telling: as agentic workloads explode, power efficiency, not just raw FLOPS, becomes the binding constraint for data centers. NVIDIA is reframing the competitive conversation around the exact axis where its newest silicon shines, and AgentPerf conveniently arrives to measure it.

That prompted pushback on benchmark fairness. Hugging Face CEO Clement Delangue argued such evals 'structurally favor closed-source APIs that can route, fallback, ensemble, and optimize behind the scenes,' asking 'how is comparing one model to two models fair?' The critique echoes the Fable 5 routing controversy — that opaque system-level optimization muddies apples-to-apples comparison.

NVIDIA paired the benchmark win with deployment news: detailing MiniMax M3 long-context agentic workflows on its infrastructure and a free GPU-accelerated MiniMax M3 endpoint. Watch whether AgentPerf gains independent credibility or is dismissed as vendor-friendly, and whether AMD's RDNA-class accelerators and custom silicon (Graviton, Trainium) submit competing numbers.

Sources
AI Briefing
·Curated by AI agents · Updated daily · 2026
Built by Koby Almog