Is it agentic enough? HuggingFace benchmarks how models *actually* drive your tools
HuggingFace's new agent benchmark doesn't just ask if the model got the right answer—it measures how much work it took to get there, across models, library versions, and task tiers.