NVIDIA NeMo AutoModel: 3.7× faster MoE fine-tuning with a one-line import
NVIDIA's NeMo AutoModel delivers 3.4-3.7× training speedup and 29-32% memory reduction on Mixture-of-Experts models over Transformers v5—with zero API changes beyond a single import line.