DiffusionGemma: Google's 4x speed boost trades autoregressive for diffusion
Google DeepMind just open-sourced a 26B MoE model that generates 256 tokens in parallel—4x faster on GPUs. It's lower quality than Gemma 4, but the architecture shift is fascinating.