Google ships Gemma 4 12B on-device model running on 16GB laptops

Google released Gemma 4 12B, an open on-device model engineered to deliver multi-step reasoning close to its 26B mixture-of-experts model while running locally on a standard laptop with only 16GB of RAM. Google paired it with an AI Edge Gallery app for Mac, lowering the barrier for developers to run capable models without cloud dependency.
The technical hook drawing the most developer attention is quantization-aware training (QAT): rather than quantizing after the fact and eating an accuracy hit, Gemma 4 is trained with quantization in the loop, preserving quality at low bit-widths. r/LocalLLaMA lit up (752 upvotes, 241 comments) dissecting the QAT approach and pairing it with homelab builds.
The release fits a clear theme-of-the-week: local-first AI. NVIDIA's RTX Spark (128GB unified memory, 120B local models) and a Rust local-first toolkit project were all trending the same days, reflecting a developer appetite for daemon-free, private, on-device inference that doesn't meter tokens.
Competitively, Gemma 4 12B targets the same practitioners Meta's Llama and Alibaba's Qwen court — open weights as a distribution and goodwill strategy. What to watch: real-world quality versus larger hosted models, and how QAT changes the calculus for edge deployment on phones and laptops.