News & Updates, Industry Analysis, API Tools

DeepSeek Cuts V4-Pro Prices 75%, Slashes Cache Costs Tenfold

DeepSeek announced a 75% limited-time discount on V4-Pro plus a permanent tenfold reduction to cache hit costs across its entire API — reigniting the pricing war that China's AI labs keep winning.

4 min read
DeepSeek Cuts V4-Pro Prices 75%, Slashes Cache Costs Tenfold

Image source: Current featured image

DeepSeek Cuts V4-Pro Prices 75%, Slashes Cache Costs Tenfold

DeepSeek announced this morning via X that it is cutting its newly-released V4-Pro model price by 75% for developers through May 5, while also permanently reducing input cache hit costs across its entire API suite to one-tenth of launch pricing. The move comes three days after DeepSeek launched its V4 preview models, and escalates a pricing war that has repeatedly undercut Western AI providers on cost.

China's DeepSeek is offering developers a 75% discount on its newly unveiled AI model, DeepSeek-V4-Pro, until May 5, 2026 at 15:59 UTC, and is cutting prices for input cache hits across its entire API lineup to one-tenth of the original price — announced via a post on X and confirmed in its official API docs.

What Changed

The price cuts hit two separate levers. First, V4-Pro is on a 75% limited-time discount through May 5 at 15:59 UTC. Second, input cache hit prices have been permanently reduced to 1/10th of launch pricing across every model on DeepSeek's API — not just V4. Any workload that sends repeated or similar requests will see lower costs immediately.

For context: V4-Pro's regular pricing sits at $0.145 per million input tokens and $3.48 per million output tokens, while V4-Flash — the smaller variant — costs $0.14 per million input and $0.28 per million output. The 75% V4-Pro discount brings the input rate to roughly $0.036 per million tokens during the promotional period — undercutting almost every major frontier model at full price.

The price gap against Western alternatives is significant. GPT-5.5 Pro charges $180 per million output tokens. At V4-Pro's regular $3.48 rate, that is already a 51x gap on output — and V4-Pro scores within a few percentage points of Claude Opus 4.6 on SWE-bench Verified and beats it on LiveCodeBench. The promotional rate pushes the gap further still.

What V4-Pro Is

DeepSeek V4-Pro is a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters per inference pass, supporting a 1 million-token context window. It is designed for advanced reasoning, coding, and long-horizon agent workflows.

The 1M context window is more affordable to use than it sounds. In the 1-million-token context setting, V4-Pro requires only 27% of the computing power of the previous V3.2 model and cuts memory use to 10%, thanks to a Hybrid Attention Architecture combining Compressed Sparse Attention and Heavily Compressed Attention. That makes it a realistic option for workloads that need to process entire codebases in a single context — a task where per-token costs add up fast.

V4-Pro also integrates directly with popular coding agent stacks. DeepSeek explicitly states the model has been optimized for Claude Code, OpenCode, OpenClaw (version 2026.4.24 or higher), and CodeBuddy. To enable the 1M context window in compatible tools, set the model to deepseek-v4-pro[1m].

V4 also runs on Huawei Ascend NPU infrastructure — an engineering milestone for China's push to build domestic AI capability independent of Nvidia.

The Pricing War Context

DeepSeek is aggressively pitching low-priced plans for its just-released flagship model, intensifying competition across a Chinese AI industry trying to take on Silicon Valley's best. This is not DeepSeek's first time forcing a pricing reset. The lab's V3 and R1 releases last year triggered visible price cuts from other providers. Today's cache price reduction — permanent and across all models — suggests DeepSeek is less interested in temporary promotions and more interested in permanently anchoring developer expectations around lower costs.

The move is expected to further pressure rivals to reduce pricing, particularly in China, where companies are rapidly building domestic alternatives amid U.S. technology export restrictions. Western providers that rely on higher margins to fund compute costs will need to respond.

What Developers Should Know

The 75% V4-Pro discount runs through May 5, 2026 at 15:59 UTC — a short window, but sufficient to benchmark the model against production workloads at near-zero cost.

Limitations worth knowing:

  • V4 is text-only for now. No image, audio, or video input is supported, in contrast to GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro. Multimodal support is on the roadmap but without a committed timeline.
  • DeepSeek's hosted API routes through infrastructure in China — relevant for teams with data residency requirements, regulated industries, or government procurement restrictions. Several Western government agencies have already restricted official use of DeepSeek.
  • V4 is a preview release. The final version has not shipped.

For cost-sensitive agentic coding workflows where latency is acceptable and data residency is not a blocker, V4-Pro during this promotional window represents the most aggressive price-to-performance ratio available from any frontier-adjacent model today.

Share:

Other Latest News

Windsurf Becomes Devin Desktop With New Agent Command Center
AI Agents, News & Updates, Code Editors

Windsurf Becomes Devin Desktop With New Agent Command Center

Cognition has relaunched Windsurf as Devin Desktop, a full IDE with an Agent Command Center and open ACP protocol support that lets Codex, Claude Agent, OpenCode, and custom in-house agents run side by side in a shared Kanban workspace.

Jun 3, 2026
OpenAI Codex Lands on Amazon Bedrock for 5 Million Weekly Users
AI Agents, News & Updates, Code Editors

OpenAI Codex Lands on Amazon Bedrock for 5 Million Weekly Users

OpenAI's AI coding agent Codex is now generally available on Amazon Bedrock, giving developers at AWS-native organizations direct access to Codex through existing cloud infrastructure, procurement, and compliance workflows.

Jun 3, 2026
Anthropic Ships Claude Opus 4.8 With Dynamic Workflows and 3x Cheaper Fast Mode
AI Agents, News & Updates, Code Editors

Anthropic Ships Claude Opus 4.8 With Dynamic Workflows and 3x Cheaper Fast Mode

Anthropic releases Claude Opus 4.8 with improved agentic coding benchmarks, a 3x cheaper Fast Mode, and Dynamic Workflows that run hundreds of parallel subagents inside Claude Code.

May 29, 2026
MiniMax Revenue Doubles as M3 Promises 15x Long-Context Speed Boost
News & Updates, AI Assistants, Industry Analysis

MiniMax Revenue Doubles as M3 Promises 15x Long-Context Speed Boost

MiniMax annualized revenue doubled in two months, crossing 1M enterprise users. The company just teased M3's sparse attention architecture claiming 15.6x faster decoding at 1M-token contexts.

May 28, 2026
Anthropic Ships Free Claude Code Security Plugin to Catch Vulnerabilities in Real Time
AI Agents, News & Updates, Code Editors

Anthropic Ships Free Claude Code Security Plugin to Catch Vulnerabilities in Real Time

Anthropic's new Security Guidance plugin for Claude Code automatically scans code changes for injection flaws, unsafe deserialization, and 25+ dangerous patterns before they reach pull requests — free for all users on every plan.

May 27, 2026
Anthropic's Mythos Model Surfaces in Claude Code Ahead of Wider Release
AI Agents, News & Updates, Code Editors

Anthropic's Mythos Model Surfaces in Claude Code Ahead of Wider Release

Anthropic's most powerful restricted model, Claude Mythos 1, briefly appeared in Claude Code and Claude Security interfaces on May 25 — signaling the first commercial rollout of the model that found over 10,000 zero-day vulnerabilities under Project Glasswing.

May 26, 2026
← Scroll for more →