OtherMay 30, 2026

Liquid AI ships LFM2.5-8B-A1B MoE trained on 38T tokens

AI Analysis

Liquid AI on May 30 released LFM2.5-8B-A1B, a mixture-of-experts model with 8B total parameters and 1B active per token, trained on 38 trillion tokens. The 38T training-token figure is notable — it's larger than the corpus Meta cited for Llama 3 (15T) and comparable to the most data-intensive frontier-model training runs disclosed publicly. The active-parameter count (1B) means inference cost is roughly 1B-equivalent while quality benefits from the full 8B expert pool.

The release is the latest entry in what's become a crowded efficient-MoE race: Mistral's Mixtral lineage, DeepSeek's MoE V3/V4, and now Liquid's LFM2.5. Hacker News (159 points) framed it as another credible small-MoE entrant challenging dense incumbents. The strategic question Liquid is testing: can a sufficiently well-trained 8B-A1B match dense 30B+ models on practical tasks at a fraction of inference cost? If so, the implication for hyperscaler capex models is significant.

This fits a broader theme in this week's batch: the developer-practitioner community is increasingly focused on the price-performance frontier rather than the absolute-quality frontier. The parallel signals are DeepSeek's 75% V4-Pro price cut, Mistral Medium 3.5's 4-GPU self-hosting claim, xAI's $1/$2 per million token grok-build-0.1, and the widely-shared writeup of running 30B models at 53 tokens-per-second on a MacBook M4 Pro. Closed US frontier labs charging $5+/M for inference are under structural pressure.

Watch next: independent benchmarks for LFM2.5-8B-A1B versus Mixtral, DeepSeek V4, and Qwen3-MoE variants; whether Liquid's architectural claims (rooted in its founding 'Liquid Neural Network' work) translate to measurable advantages; and whether the 38T-token training claim is verifiable.