DeepSeek Releases V4-Pro and V4-Flash — Open-Source 1.6T MoE With 1M Context on Huawei Chips (April 2026)
DeepSeek on April 24, 2026 released open-source V4-Pro (1.6T MoE) and V4-Flash (284B), both with a 1M-token context and trained entirely on Huawei Ascend chips. V4-Pro hits 80.6% on SWE-bench — within 0.2 points of Claude Opus 4.6 — at roughly a tenth of the price of frontier US models.
Chinese AI startup DeepSeek on released preview versions of two new open-source flagship models — DeepSeek-V4-Pro and DeepSeek-V4-Flash — both supporting a 1-million-token context window, trained entirely on Huawei Ascend chips with zero CUDA dependency, and priced at roughly a tenth of the closest US frontier models.
What Happened
One year after the R1 "Sputnik moment" that rattled Silicon Valley, DeepSeek posted V4-Pro and V4-Flash on Hugging Face under the MIT license and simultaneously shipped updated API pricing. V4-Pro is a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated per token; V4-Flash is a smaller sibling at 284B total / 13B activated. Both replace V3.2's attention stack with what DeepSeek calls a Hybrid Attention Architecture — alternating Compressed Sparse Attention and Heavily Compressed Attention across 61 transformer layers — which the company says cuts single-token inference FLOPs to 27% and KV cache to 10% of V3.2 at the 1M-token context setting.
On public benchmarks, V4-Pro scores 80.6% on SWE-bench Verified, within 0.2 points of Claude Opus 4.6, and a leaked "Max" variant reportedly hits 93.5% Pass@1 on LiveCodeBench, ahead of Gemini 3.1 Pro (91.7) and Claude Opus 4.6 Max (88.8). V4-Flash trails slightly at 79.0% SWE-bench and 91.6% LiveCodeBench. DeepSeek itself says V4-Pro falls "marginally short" of GPT-5.4 and Gemini 3.1 Pro, estimating the gap at "approximately 3 to 6 months."
Key Details
- Architecture — 1.6T-parameter MoE with 49B activated per token; hybrid CSA/HCA attention stack over 61 layers; 1M-token context on both Pro and Flash.
- Training hardware — fully trained and served on Huawei Ascend silicon, with Huawei pledging "full support"; no CUDA dependency at inference.
- Licensing — MIT license, weights released on Hugging Face at
deepseek-ai/DeepSeek-V4-Proanddeepseek-ai/DeepSeek-V4-Flash. - API pricing — Flash: $0.14 / $0.28 per million input/output tokens. Pro: $1.74 / $3.48 per million input/output tokens — roughly 10x cheaper than GPT-5.4 and Claude Opus 4.6 for comparable tasks.
- Funding backdrop — two days earlier, The Information and Bloomberg reported that Tencent and Alibaba are in talks to invest in DeepSeek at a valuation above $20 billion, with Tencent offering up to a 20% stake.
What Developers and Users Are Saying
Reaction on Hacker News has been intense, with at least four separate V4 threads hitting the front page on launch day — the technical report thread (47884933), the efficiency paper thread (47885014), and two coding-benchmark breakdowns (47884971, 47885230). Independent AI researcher Simon Willison published his "pelican on a bicycle" post calling V4 "almost on the frontier, a fraction of the price" within hours. On Reddit, r/LocalLLaMA has run in steady eruption since the weights dropped, with the top threads focused on quantized variants running on consumer hardware and the fact that the Chinese ecosystem — DeepSeek for the model, Huawei for the silicon — now ships an end-to-end training and inference stack with no US components in the critical path.
Critical voices note that DeepSeek's own report quietly concedes a 3–6 month gap to GPT-5.4 and Gemini 3.1 Pro on world knowledge, and that the 1M-context claim still needs independent needle-in-a-haystack verification. Others flag the political sensitivity of funding talks with Tencent and Alibaba proceeding while DeepSeek ships models trained on export-controlled hardware.
What This Means for Developers
For builders, V4-Flash is the most immediately useful release: at $0.14/$0.28 per million tokens it undercuts GPT-4.1 mini and Gemini 2.5 Flash while matching them on most coding benchmarks, making it a strong default for agentic code and bulk document processing. V4-Pro is currently the best open-weight coding model on public benchmarks, so teams running self-hosted inference on H200 or Ascend 910C clusters have a new ceiling to target. Tool-calling, structured outputs and a 1M-token context are all supported day one through the DeepSeek API, which is OpenAI-compatible — swapping a base URL is usually enough to switch providers via LiteLLM, OpenRouter or Helicone.
What's Next
DeepSeek's roadmap, per its technical report, targets a stable V4 release and a reasoning-tuned R2 model "in the coming weeks," with quantized distilled variants expected on Hugging Face ahead of that. The $20B-plus funding round, if it closes, would be DeepSeek's first outside capital and would likely fund a further push into agentic and multimodal systems. Developers should watch the deepseek-ai Hugging Face page for the stable V4 weights and the GitHub repo for the inference kernels.
Sources
- DeepSeek V4 Preview Release — official API docs announcement
- DeepSeek-V4-Pro model card on Hugging Face — weights, config and architecture details
- Bloomberg: DeepSeek Unveils Newest Flagship AI Model
- The Next Web: DeepSeek returns with V4-Pro and V4-Flash a year after its Sputnik moment
- Simon Willison — DeepSeek V4: almost on the frontier, a fraction of the price
- Hacker News discussion — DeepSeek-V4 technical report thread
- South China Morning Post: DeepSeek unveils next-gen AI model as Huawei vows full support
Stay up to date with Doolpa
Subscribe to Newsletter →