DeepSeek published DeepSeek V4 Preview Release on April 24, 2026: open weights, 1M-token default context on official services, and two new API model IDs—deepseek-v4-pro and deepseek-v4-flash. This article is a field note for engineers: what changed, how to migrate, and where to read primary materials—not a substitute for DeepSeek API Docs.
Benchmark and “SOTA” claims below are as stated by DeepSeek in that post; treat them as marketing-facing positioning until you run your own evals on real workloads.
TL;DR
| Topic | Takeaway |
|---|---|
| New models | deepseek-v4-pro (larger, flagship) and deepseek-v4-flash (smaller, economical). |
| Context | 1M tokens is the default across official DeepSeek services per the announcement. |
| API shape | Same base_url; swap model string. OpenAI Chat Completions + Anthropic APIs supported. |
| Modes | Thinking and Non-Thinking—see Thinking Mode. |
| Weights | Hugging Face collection + tech report PDF. |
| Legacy IDs | deepseek-chat / deepseek-reasoner retire after 2026-07-24 15:59 UTC (currently mapped to V4-Flash). |
| Try in UI | chat.deepseek.com — Expert Mode / Instant Mode per the post. |

V4-Pro vs V4-Flash (vendor-reported)
| Dimension | DeepSeek-V4-Pro | DeepSeek-V4-Flash |
|---|---|---|
| Reported scale | 1.6T total params, 49B active | 284B total, 13B active |
| Positioning | Flagship reasoning + agentic coding | Fast, cost-effective, strong on simple agent work |
| Reasoning | DeepSeek claims open-model SOTA on agentic coding benchmarks and strong Math/STEM/Coding vs other open models | DeepSeek states reasoning near Pro, on par with Pro for simple agent tasks |
GEO note: When you summarize leaderboard tables, link the PDF report or Hugging Face cards instead of copying every number—citation-friendly pages get cited more often in generative answers.
Architecture: long context and sparse attention
The post highlights token-wise compression plus DSA (DeepSeek Sparse Attention) as structural contributions, and frames them as improving long-context efficiency (compute and memory). For engineering detail, start with the tech report and model cards on Hugging Face rather than second-hand summaries.
If you are new to why context length matters for agents, our LLM context window guide walks through attention cost, KV cache, and product trade-offs—useful background when a vendor moves the default to 1M tokens.
Agent integrations and “agentic coding”
DeepSeek states V4 is integrated with Claude Code, OpenClaw, and OpenCode, and that it already powers in-house agentic coding at DeepSeek. For portable agent instructions (skills, MCP, and progressive disclosure), see what are agent skills? on ExplainX—skills are complementary to whichever base model you route through your host.
API migration checklist
- Inventory hard-coded
modelstrings (deepseek-chat,deepseek-reasoner, older aliases). - Map to
deepseek-v4-proordeepseek-v4-flashper latency and budget. - Confirm Thinking / Non-Thinking behavior against thinking mode docs.
- Set calendar for 2026-07-24 15:59 UTC legacy retirement—DeepSeek is explicit that
deepseek-chatanddeepseek-reasonerwill become inaccessible after that moment. - Re-run integration tests: tool calling, JSON modes, and streaming paths differ across providers even when the HTTP surface looks “compatible.”
Minimal pattern (illustrative only) — replace with your real client and base URL from DeepSeek’s Quick Start:
{
"model": "deepseek-v4-flash",
"messages": [{ "role": "user", "content": "Ping: confirm V4 routing." }]
}
Official sources (bookmark these)
- Announcement: DeepSeek V4 Preview Release
- Weights hub: deepseek-v4 collection on Hugging Face
- Tech report: DeepSeek_V4.pdf
- Thinking modes: Thinking Mode guide
- Product UI: chat.deepseek.com
DeepSeek closes the post with a reminder to trust official channels for news—reasonable advice when frontier releases generate noisy third-party commentary.
Related ExplainX reading
- What are agent skills? A complete guide — how SKILL.md packages interact with coding agents.
- What is MCP? Model Context Protocol guide — tools and resources alongside model swaps.
- LLM context window explained (2026) — what 1M context implies in practice.
Parameter counts, benchmark rankings, and retirement dates are quoted from DeepSeek’s April 24, 2026 API news page; verify against live docs before production cutovers.