DeepSeek Cuts V4-Pro Prices 75%, Slashes Cache Costs Tenfold

DeepSeek announced this morning via X that it is cutting its newly-released V4-Pro model price by 75% for developers through May 5, while also permanently reducing input cache hit costs across its entire API suite to one-tenth of launch pricing. The move comes three days after DeepSeek launched its V4 preview models, and escalates a pricing war that has repeatedly undercut Western AI providers on cost.

China's DeepSeek is offering developers a 75% discount on its newly unveiled AI model, DeepSeek-V4-Pro, until May 5, 2026 at 15:59 UTC, and is cutting prices for input cache hits across its entire API lineup to one-tenth of the original price — announced via a post on X and confirmed in its official API docs.

What Changed

The price cuts hit two separate levers. First, V4-Pro is on a 75% limited-time discount through May 5 at 15:59 UTC. Second, input cache hit prices have been permanently reduced to 1/10th of launch pricing across every model on DeepSeek's API — not just V4. Any workload that sends repeated or similar requests will see lower costs immediately.

For context: V4-Pro's regular pricing sits at $0.145 per million input tokens and $3.48 per million output tokens, while V4-Flash — the smaller variant — costs $0.14 per million input and $0.28 per million output. The 75% V4-Pro discount brings the input rate to roughly $0.036 per million tokens during the promotional period — undercutting almost every major frontier model at full price.

The price gap against Western alternatives is significant. GPT-5.5 Pro charges $180 per million output tokens. At V4-Pro's regular $3.48 rate, that is already a 51x gap on output — and V4-Pro scores within a few percentage points of Claude Opus 4.6 on SWE-bench Verified and beats it on LiveCodeBench. The promotional rate pushes the gap further still.

What V4-Pro Is

DeepSeek V4-Pro is a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters per inference pass, supporting a 1 million-token context window. It is designed for advanced reasoning, coding, and long-horizon agent workflows.

The 1M context window is more affordable to use than it sounds. In the 1-million-token context setting, V4-Pro requires only 27% of the computing power of the previous V3.2 model and cuts memory use to 10%, thanks to a Hybrid Attention Architecture combining Compressed Sparse Attention and Heavily Compressed Attention. That makes it a realistic option for workloads that need to process entire codebases in a single context — a task where per-token costs add up fast.

V4-Pro also integrates directly with popular coding agent stacks. DeepSeek explicitly states the model has been optimized for Claude Code, OpenCode, OpenClaw (version 2026.4.24 or higher), and CodeBuddy. To enable the 1M context window in compatible tools, set the model to deepseek-v4-pro[1m].

V4 also runs on Huawei Ascend NPU infrastructure — an engineering milestone for China's push to build domestic AI capability independent of Nvidia.

The Pricing War Context

DeepSeek is aggressively pitching low-priced plans for its just-released flagship model, intensifying competition across a Chinese AI industry trying to take on Silicon Valley's best. This is not DeepSeek's first time forcing a pricing reset. The lab's V3 and R1 releases last year triggered visible price cuts from other providers. Today's cache price reduction — permanent and across all models — suggests DeepSeek is less interested in temporary promotions and more interested in permanently anchoring developer expectations around lower costs.

The move is expected to further pressure rivals to reduce pricing, particularly in China, where companies are rapidly building domestic alternatives amid U.S. technology export restrictions. Western providers that rely on higher margins to fund compute costs will need to respond.

What Developers Should Know

The 75% V4-Pro discount runs through May 5, 2026 at 15:59 UTC — a short window, but sufficient to benchmark the model against production workloads at near-zero cost.

Limitations worth knowing:

V4 is text-only for now. No image, audio, or video input is supported, in contrast to GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro. Multimodal support is on the roadmap but without a committed timeline.
DeepSeek's hosted API routes through infrastructure in China — relevant for teams with data residency requirements, regulated industries, or government procurement restrictions. Several Western government agencies have already restricted official use of DeepSeek.
V4 is a preview release. The final version has not shipped.

For cost-sensitive agentic coding workflows where latency is acceptable and data residency is not a blocker, V4-Pro during this promotional window represents the most aggressive price-to-performance ratio available from any frontier-adjacent model today.