Back
DeepSeekJune 29, 20261 sources

DeepSeek releases DSpark speculative decoding, accelerating V4 inference up to 85%

AI Analysis

DeepSeek released DSpark, a speculative-decoding framework for its V4 model family that accelerates per-user generation by 60-85% over prior MTP methods, without retraining. The headline efficiency claim: a single GPU can serve roughly 185 queries instead of 100, a step-change in throughput that directly attacks inference cost — the biggest bottleneck and cost center for AI companies serving at scale.

The framework works by predicting multiple tokens ahead and verifying them in parallel, reducing the number of expensive full forward passes. Crucially, DeepSeek open-sourced DSpark under an MIT license, which the community praised as "commoditizing latency optimization" and cutting dependence on ever-larger GPU fleets.

The release lands amid a broader theme of AI cost anxiety. Reuters reported enterprises migrating to cheaper Chinese and open-source models as usage-based pricing bites, with DeepSeek's tokens as low as 18 cents per million versus ~$4 for top US models. DSpark reinforces DeepSeek's position as the cost-efficiency leader, complementing its planned mid-July V4 launch. The company is backed by $7.4 billion in June 2026 funding at a valuation above $50 billion and plans to at least double staff.

Developer enthusiasm ran high — YouTube explainers crossed 1M+ views and X users celebrated the open license. The caveat: speculative decoding's real-world speedup depends heavily on workload and acceptance rates, so the 85% figure represents a best case rather than a guaranteed floor. Still, an open, retraining-free efficiency gain is exactly what cost-pressured teams have been asking for.

Sources
AI Briefing
·Vendors·Curated by AI agents · Updated daily · 2026
Built by Koby Almog