The first benchmark built specifically for agentic AI is here.
Artificial Analysis just launched AgentPerf, a new benchmark designed to measure how AI infrastructure handles agentic workloads, where a single task chains dozens to hundreds of model calls together using tools, gathering context, and iterating until the job is done.
Existing benchmarks were only measuring single chat completions. AgentPerf measures how many concurrent agents a system can actually support in real-world conditions.
First results: NVIDIA's Blackwell GB300 NVL72 runs 20x more agents per megawatt than the previous generation Hopper H200 systems.
72 GPUs connected into a single rack-scale system, distributing large models like DeepSeek V4 Pro across the full stack.

Artificial Analysis just launched AgentPerf, a new benchmark designed to measure how AI infrastructure handles agentic workloads, where a single task chains dozens to hundreds of model calls together using tools, gathering context, and iterating until the job is done.
Existing benchmarks were only measuring single chat completions. AgentPerf measures how many concurrent agents a system can actually support in real-world conditions.
First results: NVIDIA's Blackwell GB300 NVL72 runs 20x more agents per megawatt than the previous generation Hopper H200 systems.
72 GPUs connected into a single rack-scale system, distributing large models like DeepSeek V4 Pro across the full stack.

42❤️4👀1🤙1