#synthetic-data

#…

4 posts

Data for Agents: NVIDIA Nemotron's Open Data Strategy and Why Synthetic Personas Matter

NVIDIA is releasing over 10 trillion pre-training tokens and millions of post-training samples for agent development—and building synthetic personas representing 2.4B people. Here's why that matters.

#agents #synthetic-data #nvidia #open-data #datasets

EVA-Bench 2.0: Three Domains, 213 Scenarios, and the Real Cost of Voice AI Eval

ServiceNow's new voice-agent benchmark spans airlines, IT, and healthcare—with joint-generation pipelines, adversarial scenarios, and a coming multilingual expansion.

#voice-agents #benchmarks #evaluation #synthetic-data #multilingual

NVIDIA Drops Synthetic Persona Dataset to Ground Korean AI Agents in Real Demographics

NVIDIA's new Nemotron-based dataset gives developers 4,800 demographically grounded Korean personas to build culturally aware AI agents—a blueprint for non-English AI.

#synthetic-data #agents #multilingual #datasets #nvidia

NVIDIA's Nemotron OCR v2: How Synthetic Data Built a Multilingual Vision Powerhouse

NVIDIA just open-sourced a state-of-the-art OCR model trained almost entirely on synthetic data. Here's why that matters for the future of vision-language models.

#synthetic-data #ocr #vision-language-models #multilingual #open-source

Loading…