Open Source
DeepSeek V4: Trillion-Parameter Open-Source Multimodal Model Challenges Western AI Dominance
A trillion-parameter Mixture-of-Experts architecture with ~32B active parameters, native multimodality, and a 1M-token context window — optimized for Huawei Ascend chips rather than Nvidia GPUs.
DeepSeek is releasing V4, a trillion-parameter Mixture-of-Experts model that activates roughly 32 billion parameters per token — an efficiency ratio that allows it to operate at a fraction of the inference cost of comparable dense models while achieving benchmark scores that place it squarely in frontier territory. Leaked results suggest HumanEval accuracy around 90% and SWE-bench Verified scores above 80%, numbers that would position DeepSeek V4 alongside or ahead of the best Western closed-source models on the coding and software engineering tasks that increasingly define competitive standing in AI.
The architecture introduces three novel components that distinguish it from incremental scaling of existing designs. Manifold-Constrained Hyper-Connections allow expert layers to share structured representations across the MoE routing boundary, reducing the information loss that typically degrades quality when most parameters are inactive. Engram Conditional Memory provides a persistent memory mechanism that allows the model to maintain coherent reasoning across its full 1M-token context window without the catastrophic degradation that plagues standard attention at extreme sequence lengths. And a Lightning Indexer implements sparse attention patterns that skip irrelevant context blocks entirely, cutting latency on long-context inference by up to 60% compared to standard full-attention approaches.
Perhaps most significant is the hardware story: DeepSeek V4 was trained on and optimized for Huawei’s Ascend chip ecosystem rather than Nvidia GPUs. This represents a concrete demonstration that frontier AI training can proceed without access to American semiconductor technology — the very scenario that U.S. export controls were designed to prevent. If DeepSeek V4’s benchmarks hold up under independent evaluation, the strategic case for chip export restrictions becomes considerably harder to make, since the restrictions would be imposing economic costs on American companies without achieving their stated national security objective.
As an open-source release, V4 will be available for download and fine-tuning, continuing DeepSeek’s strategy of using open weights to build ecosystem lock-in and attract talent. For Western AI labs, the competitive pressure is now arriving from two directions simultaneously: DeepSeek is matching their capability while undercutting their business model by giving the technology away.