GoogleJune 11, 20263 sources

Google DeepMind releases DiffusionGemma, a 26B open model generating text 4x faster via diffusion

AI Analysis

DiffusionGemma applies diffusion — the technique behind image generation — to text, producing tokens in parallel rather than autoregressively one at a time. The payoff is throughput: 4-5x faster generation and over 1,000 tokens per second on a single H100, with a 26B-parameter footprint released openly under Apache 2.0 across Hugging Face, Kaggle, and Google Cloud's Vertex AI Model Garden.

The trade-off is quality: DiffusionGemma scores lower than the autoregressive Gemma 4 on standard benchmarks, and DeepMind frames it as an experimental release exploring whether diffusion can rival autoregression at scale. For latency-sensitive applications — real-time agents, high-volume generation — the speed could outweigh the accuracy gap.

The open-weights move also feeds the local-AI theme of the week, and momentum is already building downstream: NVIDIA released an NVFP4-quantized variant within a day to cut memory requirements further. Developers expressed excitement about faster generation paired with curiosity about accuracy trade-offs versus traditional models. The open question is whether diffusion text models can close the benchmark gap in future iterations or remain a niche speed play.

Sources

deepmind.google

https://deepmind.google/blog/diffusiongemma-4x-faster-text-generation/

airank.dev

https://airank.dev/models/gemini-diffusion

deepmind.com

https://www.deepmind.com/models/gemini-diffusion