DeepSeek open-sources DSpark inference framework, claiming up to 85% speedups

DeepSeek released DSpark, an open-source inference optimization framework available under an MIT license on GitHub, claiming up to 85% faster generation speeds without hardware upgrades or model retraining. The framework uses advanced speculative decoding and is already deployed across DeepSeek's V4-Flash and V4-Pro production models, where it shows 60–85% speed improvements.
Technically, speculative decoding uses a smaller 'draft' model to propose tokens that the larger model verifies in parallel, cutting latency. DSpark bundles DeepSpec, a full-stack codebase for training custom draft models, and notably supports models from other vendors — including Alibaba's Qwen and Google's Gemma — making it a broadly useful efficiency layer rather than a DeepSeek-only tool.
Strategically, open-sourcing an inference accelerator that works across vendors extends DeepSeek's cost-leadership narrative beyond its own models and into the broader ecosystem — exactly the kind of efficiency-first engineering Chinese labs are using to close the frontier gap on a budget. For cost-conscious teams fleeing soaring US-model bills, a free framework that boosts throughput on hardware they already own is directly relevant.
The release fits the week's dominant cost-efficiency theme. Skeptics will want to see the 85% figure validated on diverse workloads, since speculative decoding gains vary heavily by task and draft-model quality. What to watch: third-party adoption of DSpark on non-DeepSeek models and whether the claimed speedups hold outside DeepSeek's own benchmarks.