NVIDIA releases quantized DeepSeek-V4-Pro-NVFP4 on Hugging Face
NVIDIA released DeepSeek-V4-Pro-NVFP4 on Hugging Face, a quantized variant of DeepSeek's V4-Pro Mixture-of-Experts model optimized with NVIDIA's Model Optimizer. The model carries 1.6 trillion total parameters with 49 billion activated per forward pass, and the NVFP4 quantization is designed to shrink the memory and compute footprint for efficient inference on NVIDIA hardware while preserving capability.
The model is pitched for advanced reasoning, agentic AI applications, tool utilization and complex problem-solving across mathematics, software engineering and enterprise AI assistants. NVFP4 — NVIDIA's 4-bit floating-point format — is central to the company's inference-efficiency story, directly tied to the inference-demand inflection Huang flagged this week.
Strategically, NVIDIA optimizing and redistributing a leading Chinese open model is notable: it makes DeepSeek's frontier MoE more deployable on NVIDIA's stack, reinforcing the GPU moat regardless of which lab trains the weights. It also lands alongside DeepSeek's own V4 family expansion and 75% price cut, signaling intense cost competition at the open-weights frontier. The caveat for adopters is the usual quantization tradeoff — NVFP4 can introduce accuracy degradation on edge cases, so teams will want to validate against the full-precision V4-Pro before production use.