EVA-Bench 2.0: Three Domains, 213 Scenarios, and the Real Cost of Voice AI Eval
ServiceNow's new voice-agent benchmark spans airlines, IT, and healthcare—with joint-generation pipelines, adversarial scenarios, and a coming multilingual expansion.
3 posts
ServiceNow's new voice-agent benchmark spans airlines, IT, and healthcare—with joint-generation pipelines, adversarial scenarios, and a coming multilingual expansion.
NVIDIA's new Nemotron-based dataset gives developers 4,800 demographically grounded Korean personas to build culturally aware AI agents—a blueprint for non-English AI.
NVIDIA just open-sourced a state-of-the-art OCR model trained almost entirely on synthetic data. Here's why that matters for the future of vision-language models.