ITBench-AA: The Enterprise Agent Reality Check Nobody Asked For (But Everybody Needs)
Frontier models score below 50% on Kubernetes incident response. The new ITBench-AA benchmark from Artificial Analysis and IBM reveals the gap between agent demos and production IT work.