Hugging Face and Cerebras demo low-latency voice AI on Gemma 4, release open-source ML Intern agent

Hugging Face pushed on two open fronts. With Cerebras, it built a low-latency speech-to-speech pipeline combining a Cerebras-hosted Gemma 4 31B model with Nvidia Parakeet (STT) and Qwen3TTS (TTS) — a fully open, multi-vendor stack that's already powering 9,000+ Reachy Mini robots. The demo is a statement that competitive real-time voice can be assembled from open components, a direct counter to closed offerings like xAI's just-launched Voice Agent Builder.
Separately, Hugging Face released ML Intern, an open-source ML agent that reportedly beat Anthropic's Claude Code on GPQA (32% vs. 22.99%) and OpenAI's Codex on healthcare evals. If the numbers hold, it's a notable data point that open agents can match or exceed closed coding tools on specific benchmarks.
The strategic thread is CEO Clement Delangue's open-science thesis, which he pressed this week: 'Instead of closed-source frontier labs running the same training runs in secret and siloes, open science and open-source AI allows them to mutualize spending and compute making them an order of magnitude more efficient.' The Cerebras pipeline and ML Intern are concrete arguments for that efficiency claim.
Competitive context: this fits the week's broader open-vs-closed tension — Mistral's sovereignty push, cheap Chinese open models — against gated US frontier releases. Skeptical takes: benchmark wins on GPQA and healthcare evals are narrow and self-selected; broad real-world superiority isn't established. What to watch: independent reproduction of ML Intern's benchmark claims and whether the open voice stack gains adoption beyond Reachy Mini.