News & Updates, Industry Analysis, API Tools

DeepSeek Cuts V4-Pro Prices 75%, Slashes Cache Costs Tenfold

DeepSeek announced a 75% limited-time discount on V4-Pro plus a permanent tenfold reduction to cache hit costs across its entire API — reigniting the pricing war that China's AI labs keep winning.

4 min read
DeepSeek Cuts V4-Pro Prices 75%, Slashes Cache Costs Tenfold

Image source: Current featured image

DeepSeek Cuts V4-Pro Prices 75%, Slashes Cache Costs Tenfold

DeepSeek announced this morning via X that it is cutting its newly-released V4-Pro model price by 75% for developers through May 5, while also permanently reducing input cache hit costs across its entire API suite to one-tenth of launch pricing. The move comes three days after DeepSeek launched its V4 preview models, and escalates a pricing war that has repeatedly undercut Western AI providers on cost.

China's DeepSeek is offering developers a 75% discount on its newly unveiled AI model, DeepSeek-V4-Pro, until May 5, 2026 at 15:59 UTC, and is cutting prices for input cache hits across its entire API lineup to one-tenth of the original price — announced via a post on X and confirmed in its official API docs.

What Changed

The price cuts hit two separate levers. First, V4-Pro is on a 75% limited-time discount through May 5 at 15:59 UTC. Second, input cache hit prices have been permanently reduced to 1/10th of launch pricing across every model on DeepSeek's API — not just V4. Any workload that sends repeated or similar requests will see lower costs immediately.

For context: V4-Pro's regular pricing sits at $0.145 per million input tokens and $3.48 per million output tokens, while V4-Flash — the smaller variant — costs $0.14 per million input and $0.28 per million output. The 75% V4-Pro discount brings the input rate to roughly $0.036 per million tokens during the promotional period — undercutting almost every major frontier model at full price.

The price gap against Western alternatives is significant. GPT-5.5 Pro charges $180 per million output tokens. At V4-Pro's regular $3.48 rate, that is already a 51x gap on output — and V4-Pro scores within a few percentage points of Claude Opus 4.6 on SWE-bench Verified and beats it on LiveCodeBench. The promotional rate pushes the gap further still.

What V4-Pro Is

DeepSeek V4-Pro is a Mixture-of-Experts model with 1.6 trillion total parameters and 49 billion activated parameters per inference pass, supporting a 1 million-token context window. It is designed for advanced reasoning, coding, and long-horizon agent workflows.

The 1M context window is more affordable to use than it sounds. In the 1-million-token context setting, V4-Pro requires only 27% of the computing power of the previous V3.2 model and cuts memory use to 10%, thanks to a Hybrid Attention Architecture combining Compressed Sparse Attention and Heavily Compressed Attention. That makes it a realistic option for workloads that need to process entire codebases in a single context — a task where per-token costs add up fast.

V4-Pro also integrates directly with popular coding agent stacks. DeepSeek explicitly states the model has been optimized for Claude Code, OpenCode, OpenClaw (version 2026.4.24 or higher), and CodeBuddy. To enable the 1M context window in compatible tools, set the model to deepseek-v4-pro[1m].

V4 also runs on Huawei Ascend NPU infrastructure — an engineering milestone for China's push to build domestic AI capability independent of Nvidia.

The Pricing War Context

DeepSeek is aggressively pitching low-priced plans for its just-released flagship model, intensifying competition across a Chinese AI industry trying to take on Silicon Valley's best. This is not DeepSeek's first time forcing a pricing reset. The lab's V3 and R1 releases last year triggered visible price cuts from other providers. Today's cache price reduction — permanent and across all models — suggests DeepSeek is less interested in temporary promotions and more interested in permanently anchoring developer expectations around lower costs.

The move is expected to further pressure rivals to reduce pricing, particularly in China, where companies are rapidly building domestic alternatives amid U.S. technology export restrictions. Western providers that rely on higher margins to fund compute costs will need to respond.

What Developers Should Know

The 75% V4-Pro discount runs through May 5, 2026 at 15:59 UTC — a short window, but sufficient to benchmark the model against production workloads at near-zero cost.

Limitations worth knowing:

  • V4 is text-only for now. No image, audio, or video input is supported, in contrast to GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro. Multimodal support is on the roadmap but without a committed timeline.
  • DeepSeek's hosted API routes through infrastructure in China — relevant for teams with data residency requirements, regulated industries, or government procurement restrictions. Several Western government agencies have already restricted official use of DeepSeek.
  • V4 is a preview release. The final version has not shipped.

For cost-sensitive agentic coding workflows where latency is acceptable and data residency is not a blocker, V4-Pro during this promotional window represents the most aggressive price-to-performance ratio available from any frontier-adjacent model today.

Share:

Other Latest News

xAI Quietly Restricts Free Grok Access as Multi-Day Outage Persists
News & Updates, AI Assistants, Industry Analysis

xAI Quietly Restricts Free Grok Access as Multi-Day Outage Persists

Free Grok users have been silently locked out of Expert and Auto models since April 24, following days of unacknowledged High Demand errors that left both free and paid users unable to access the chatbot.

Apr 26, 2026
Cursor 3.2 Ships Async Subagents, Worktrees, and Multi-repo Support
News & Updates, Code Editors, AI Agents

Cursor 3.2 Ships Async Subagents, Worktrees, and Multi-repo Support

Cursor 3.2 lands with /multitask for parallel async subagents, redesigned Agents Window worktrees, and multi-root workspace support for cross-repo changes — a significant push toward fleet-based autonomous coding.

Apr 25, 2026
Google Commits $40B to Anthropic as Claude Code Fuels AI Infrastructure Race
News & Updates, Industry Analysis

Google Commits $40B to Anthropic as Claude Code Fuels AI Infrastructure Race

Alphabet is committing $10 billion now at a $350B valuation, with $30 billion more contingent on performance targets, plus 5 gigawatts of Google Cloud compute, all driven by Claude Code's explosive revenue growth.

Apr 25, 2026
DeepSeek Drops V4: 1.6T-Parameter Open Model Targets Frontier
News & Updates, AI Assistants, Industry Analysis

DeepSeek Drops V4: 1.6T-Parameter Open Model Targets Frontier

DeepSeek released open-source V4 today — a 1.6 trillion-parameter flagship and a 284 billion-parameter flash variant — claiming competitive performance with closed models from OpenAI and Google DeepMind, with a 1 million-token context window and aggressive pricing.

Apr 24, 2026
OpenAI Ships GPT-5.5: Agentic Coding Model Now Live in Codex
News & Updates, AI Agents, API Tools

OpenAI Ships GPT-5.5: Agentic Coding Model Now Live in Codex

OpenAI launched GPT-5.5 on April 23 — its first fully retrained base model since GPT-4.5 — now live for paid ChatGPT and Codex users. With 82.7% on Terminal-Bench 2.0 and state-of-the-art performance on long-horizon coding tasks, it marks a new bar for agentic developer tooling.

Apr 24, 2026
OpenAI Workspace Agents Replace GPTs: Codex Now Automates Team Dev Workflows
AI Agents, News & Updates, API Tools

OpenAI Workspace Agents Replace GPTs: Codex Now Automates Team Dev Workflows

OpenAI launched Workspace Agents in ChatGPT on April 22 — Codex-powered cloud bots that replace custom GPTs, write code, and automate multi-step team workflows around the clock. Free until May 6.

Apr 23, 2026
← Scroll for more →