DeepSeek V4 Preview: 1M Context, MIT License, Pro at $1.74/M Tokens

Chinese AI lab DeepSeek on Friday, April 24, 2026 open-sourced the preview of its long-anticipated V4 series — two Mixture-of-Experts models that ship with a native 1-million-token context window, MIT-licensed open weights on Hugging Face, and API pricing that undercuts every Western frontier lab by a factor of 2 to 7. The release marks the first major DeepSeek model since V3.2 in December 2025 and arrives one year after the Hangzhou-based startup's R1 launch triggered the so-called "Sputnik moment" that wiped roughly $1 trillion off Western AI stocks.

What Happened

DeepSeek published the V4 Preview on its official API docs at roughly 06:00 UTC on April 24, with the open-weight checkpoints landing simultaneously on the deepseek-ai Hugging Face collection. The series ships in two variants:

DeepSeek-V4-Pro — 1.6 trillion total parameters with 49 billion active per token. At 865 GB on Hugging Face, it is now the largest open-weight model in the world, surpassing Kimi K2.6 (1.1T) and GLM-5.1 (754B), and is more than twice the size of V3.2.
DeepSeek-V4-Flash — 284B total parameters with 13B active. The 160 GB Flash model is the cheaper, faster sibling, intended to anchor the bottom of DeepSeek's API price card.

Both models are released under the standard MIT license, allowing free commercial use, and both are exposed via the existing DeepSeek API by simply switching the model parameter to deepseek-v4-pro or deepseek-v4-flash. The legacy deepseek-chat and deepseek-reasoner endpoints will be retired on July 24, 2026, and are currently routed to V4-Flash. The API also speaks both OpenAI ChatCompletions and Anthropic-compatible payload formats, and exposes a dual Thinking / Non-Thinking mode toggle on every request.

DeepSeek V4 specifications: V4-Pro 1.6T total / 49B active and V4-Flash 284B total / 13B active, both with 1M context — The V4 spec card published by DeepSeek alongside the release: a 1.6T-parameter Pro model and a 284B Flash model, both with a default 1M-token context.

Key Details

Novel attention architecture — DeepSeek calls it "token-wise compression + DSA (DeepSeek Sparse Attention)". In the 1M-token regime, V4-Pro uses only 27% of the single-token FLOPs and 10% of the KV-cache size of V3.2; V4-Flash drops those numbers to 10% and 7%, respectively.
API pricing — $0.14 / million input tokens and $0.28 / million output tokens for Flash, and $1.74 / $3.48 for Pro. By Simon Willison's published comparison, that makes Flash the cheapest small model on the public market — beating GPT-5.4 Nano ($0.20 / $1.25) — and Pro the cheapest of the larger frontier-class models, undercutting Gemini 3.1 Pro ($2 / $12) and GPT-5.4 ($2.50 / $15) by 2-7x on output tokens alone.
Self-reported benchmarks — DeepSeek claims V4-Pro is the open-source state of the art on agentic-coding benchmarks and trails only Gemini 3.1 Pro on world-knowledge evaluations among open models. The technical report concedes that with extended reasoning tokens, V4-Pro-Max sits roughly 3-6 months behind GPT-5.4 and Gemini 3.1 Pro on the absolute frontier.
Agent integration on day one — V4 is pre-integrated with Claude Code, OpenClaw, and OpenCode, and DeepSeek says the model already drives its own internal agentic-coding workflows.

DeepSeek V4 efficiency chart showing single-token FLOPs and KV cache reductions versus V3.2 — Compute and memory efficiency at 1M-token context, as published in the V4 technical report — the key reason DeepSeek can price these models so aggressively.

What Developers and Researchers Are Saying

On Hacker News, the top comment on the launch thread (item 47884971) called V4-Pro "frontier level (better than Opus 4.6) at a fraction of the cost," and the thread is dominated by debate over whether the self-reported benchmarks will hold up against independent evaluations like Artificial Analysis or LiveCodeBench. Independent ML practitioner Simon Willison ran both models through his standard "pelican on a bicycle" SVG test via OpenRouter and concluded the pricing is the real story: "DeepSeek-V4-Flash is the cheapest of the small models, beating even OpenAI's GPT-5.4 Nano. DeepSeek-V4-Pro is the cheapest of the larger frontier models."

Reaction in the broader press was mixed but attentive. CNBC framed V4 as proof that the US-China AI race is "not over"; MIT Technology Review argued the bigger story is the efficiency math, not the leaderboard score; and TechCrunch noted DeepSeek's continued silence on training compute and data sources. ML researcher reactions on r/LocalLLaMA were more cautious — multiple users pointed out that a 1.6T-parameter model is unrunnable on consumer hardware even with aggressive quantization, and some questioned whether the "open" V4-Pro is practically open in any meaningful sense without a hosting partner.

What This Means for Developers

The immediate impact is on cost. Any team currently routing high-volume traffic through GPT-5.4, Claude Sonnet 4.6, or Gemini 3.1 Pro can now plausibly A/B test against V4-Pro and capture a 60-90% reduction in per-token cost, particularly on output-heavy workloads such as code generation and long-document summarization. The 1M-token context window — now the default rather than an opt-in — also makes V4 immediately attractive for codebase-aware agents and long-document RAG pipelines. Migration is a one-line change for OpenAI-compatible clients: keep the base URL, swap the model name, and remember to disable any application-level chunking that assumes a smaller window. Teams using LiteLLM, OpenRouter, or Cline can route to V4 today without touching client code.

The catch: legacy deepseek-chat and deepseek-reasoner endpoints disappear on July 24, 2026 (15:59 UTC), so any production integration on those model strings has roughly 90 days to migrate. Teams with strict data-residency or geopolitical constraints around Chinese-hosted inference should plan around either self-hosting the open weights, routing via OpenRouter or a US-based inference provider, or staying on a Western frontier model.

What's Next

The V4 release is officially marked a Preview; DeepSeek's tech report flags V4-Pro-Max — a higher-reasoning-token variant — as the model intended to close the remaining gap to GPT-5.4 and Gemini 3.1 Pro. Quantized Flash builds from the Unsloth team are expected to land within days, which should make local single-GPU inference viable on consumer hardware. The full V4 technical report is published as a PDF inside the DeepSeek-V4-Pro Hugging Face repo, and a stable (post-Preview) release is expected ahead of the July 24 deprecation date for legacy endpoints.

DeepSeek V4 API pricing chart showing V4-Flash at $0.14 input and V4-Pro at $1.74 input per million tokens — The V4 API price card: $0.14 / $0.28 per million tokens for Flash and $1.74 / $3.48 for Pro — far below every comparable Western frontier model.

Sources

DeepSeek API Docs — V4 Preview Release — primary source: official announcement and pricing
Hugging Face — DeepSeek-V4 collection — open weights and technical report PDF
Simon Willison — DeepSeek V4 — almost on the frontier, a fraction of the price — independent technical analysis with full pricing table
TechCrunch — DeepSeek previews new AI model that closes the gap with frontier models
CNBC — China's DeepSeek releases preview of long-awaited V4 model
MIT Technology Review — Three reasons why DeepSeek's new model matters
Hacker News discussion thread (item 47884971) — developer reactions and benchmark debate

What Happened

DeepSeek-V4-Pro — 1.6 trillion total parameters with 49 billion active per token. At 865 GB on Hugging Face, it is now the largest open-weight model in the world, surpassing Kimi K2.6 (1.1T) and GLM-5.1 (754B), and is more than twice the size of V3.2.
DeepSeek-V4-Flash — 284B total parameters with 13B active. The 160 GB Flash model is the cheaper, faster sibling, intended to anchor the bottom of DeepSeek's API price card.

Key Details

Novel attention architecture — DeepSeek calls it "token-wise compression + DSA (DeepSeek Sparse Attention)". In the 1M-token regime, V4-Pro uses only 27% of the single-token FLOPs and 10% of the KV-cache size of V3.2; V4-Flash drops those numbers to 10% and 7%, respectively.
API pricing — $0.14 / million input tokens and $0.28 / million output tokens for Flash, and $1.74 / $3.48 for Pro. By Simon Willison's published comparison, that makes Flash the cheapest small model on the public market — beating GPT-5.4 Nano ($0.20 / $1.25) — and Pro the cheapest of the larger frontier-class models, undercutting Gemini 3.1 Pro ($2 / $12) and GPT-5.4 ($2.50 / $15) by 2-7x on output tokens alone.
Self-reported benchmarks — DeepSeek claims V4-Pro is the open-source state of the art on agentic-coding benchmarks and trails only Gemini 3.1 Pro on world-knowledge evaluations among open models. The technical report concedes that with extended reasoning tokens, V4-Pro-Max sits roughly 3-6 months behind GPT-5.4 and Gemini 3.1 Pro on the absolute frontier.
Agent integration on day one — V4 is pre-integrated with Claude Code, OpenClaw, and OpenCode, and DeepSeek says the model already drives its own internal agentic-coding workflows.

What Developers and Researchers Are Saying

What This Means for Developers

What's Next

Sources

DeepSeek API Docs — V4 Preview Release — primary source: official announcement and pricing
Hugging Face — DeepSeek-V4 collection — open weights and technical report PDF
Simon Willison — DeepSeek V4 — almost on the frontier, a fraction of the price — independent technical analysis with full pricing table
TechCrunch — DeepSeek previews new AI model that closes the gap with frontier models
CNBC — China's DeepSeek releases preview of long-awaited V4 model
MIT Technology Review — Three reasons why DeepSeek's new model matters
Hacker News discussion thread (item 47884971) — developer reactions and benchmark debate

DeepSeek V4 Preview Released: 1.6T-Param Pro Model, 1M Context, and Frontier Performance at $1.74/M Tokens (April 2026)

What Happened

Key Details

What Developers and Researchers Are Saying

What This Means for Developers

What's Next

Sources

DeepSeek V4 Preview Released: 1.6T-Param Pro Model, 1M Context, and Frontier Performance at $1.74/M Tokens (April 2026)

What Happened

Key Details

What Developers and Researchers Are Saying

What This Means for Developers

What's Next

Sources