News & Updates, Infrastructure, AI Agents

OpenAI and Broadcom Unveil Jalapeño, a Custom Chip Built for Codex

OpenAI's first custom silicon — an inference-only ASIC called Jalapeño, co-designed with Broadcom in nine months — promises substantially better performance-per-watt for ChatGPT and Codex workloads, with initial deployment targeted before year's end.

3 min read
OpenAI and Broadcom Unveil Jalapeño, a Custom Chip Built for Codex

Image by CWA

OpenAI and Broadcom Unveil Jalapeño, a Custom Chip Built for Codex

OpenAI on Wednesday unveiled Jalapeño, its first custom-built inference accelerator, co-developed with Broadcom and produced by Celestica. The chip is purpose-built for LLM inference — the process of generating responses for live users — rather than model training, and represents OpenAI's most concrete step yet toward owning its own silicon stack.

Engineering samples of Jalapeño are already running ML workloads in the lab, including GPT-5.3-Codex-Spark. OpenAI says early testing shows substantially better performance-per-watt than current state-of-the-art alternatives, though no specific numbers have been disclosed. The company is targeting initial deployment by the end of 2026.

"Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers," said Richard Ho, who leads OpenAI's hardware program. "We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models."

Why Inference Hardware Matters for Developers

Training gets the press, but inference is where AI costs developers money and where latency determines user experience. Every ChatGPT response, every Codex agent task, every API call is inference. General-purpose GPUs — designed for the full matrix of AI workloads — are expensive and not always optimally matched to the specific patterns of transformer inference at scale.

Jalapeño is an ASIC: a fixed-function chip optimized for a narrow workload. That makes it less flexible than an Nvidia GPU but allows the architecture to be tuned precisely for the memory movement, attention computation, and networking patterns that dominate OpenAI's inference systems. The practical implication, if early performance holds: lower cost-per-token for Codex and ChatGPT API calls, which would eventually flow through to pricing and rate limits.

OpenAI's own AI models assisted in the chip's development — a detail that underscores how deeply software and hardware teams have integrated. The chip reached tape-out in just nine months, which Broadcom CEO Hock Tan described as potentially the fastest ASIC development cycle ever achieved in high-performance semiconductors. The same models served to users are now helping improve the infrastructure used to run future models.

The Infrastructure Play Behind the Chip

Jalapeño is the first chip in a planned multi-generation compute platform. OpenAI and Broadcom are targeting gigawatt-scale data center deployments with Microsoft and other partners, beginning before the end of 2026 and expanding in subsequent years. Broadcom's Tomahawk networking silicon will interconnect Jalapeño-powered racks, and Celestica handles board, rack, and system production.

The strategic rationale is clear. Google has TPUs, Amazon has Trainium, and Microsoft has Azure Maia. OpenAI has been one of Nvidia's largest GPU customers, which creates a structural cost disadvantage as inference volume grows. Every inference task that Jalapeño handles more efficiently than a GPU improves OpenAI's unit economics — critical for a company still burning capital ahead of a planned IPO.

"By designing more of the stack ourselves, we can serve more intelligence with greater efficiency and keep pushing advanced AI toward broader access," said OpenAI President Greg Brockman.

What Developers Should Watch

For developers building on the OpenAI API today, Jalapeño changes nothing immediately. The chip is in engineering sample phase and initial deployment will focus on controlled internal workloads. OpenAI has not announced API pricing changes tied to the chip launch.

The longer-term signal matters more. If Jalapeño delivers on its performance-per-watt claims, the economics of running Codex agent tasks and GPT API calls improve structurally — potentially translating to lower API costs or higher rate limits over time. OpenAI also described Jalapeño as designed with flexibility to run "all LLMs guided by OpenAI's insights," which leaves open the possibility of third-party access to Jalapeño infrastructure in future generations, though that remains speculative.

What's Unconfirmed

OpenAI has shared no independent benchmark data for Jalapeño. Performance claims come from internal testing only; specific numbers on throughput, latency, and cost per token have not been disclosed. Tom's Hardware notes the die image does not reveal enough architectural detail to independently verify design claims. The end-of-2026 deployment target refers to initial deployment, not full production rollout, and whether first deployments will serve internal workloads only or customer-facing API traffic has not been specified.

Share:

Other Latest News

Anthropic Accuses Alibaba of Largest-Ever Claude Distillation Attack
News & Updates, Security, AI Agents

Anthropic Accuses Alibaba of Largest-Ever Claude Distillation Attack

Anthropic sent a letter to US senators accusing Alibaba's Qwen AI lab of using nearly 25,000 fraudulent accounts to run 28.8 million exchanges with Claude — the biggest known distillation campaign against any American AI lab.

Jun 26, 2026
Cursor Reveals First Self-Trained Model Ships in Weeks, Launches Mobile iOS Beta
News & Updates, Code Editors

Cursor Reveals First Self-Trained Model Ships in Weeks, Launches Mobile iOS Beta

Cursor confirmed at a company event today that its first fully in-house trained AI model is weeks away from shipping and unveiled Cursor Mobile, an iOS app for remote agent supervision.

Jun 24, 2026
Anthropic Launches Claude Tag: @Claude Now Lives in Your Slack
AI Agents, News & Updates

Anthropic Launches Claude Tag: @Claude Now Lives in Your Slack

Anthropic launched Claude Tag in beta for Enterprise and Team customers — an always-on @Claude that joins Slack channels, builds context over time, and autonomously runs coding and data tasks without being prompted for every step.

Jun 24, 2026
OpenAI Ships GPT-5.5-Cyber, Codex Security, and Patch the Planet
Security, AI Agents, News & Updates

OpenAI Ships GPT-5.5-Cyber, Codex Security, and Patch the Planet

OpenAI expanded its Daybreak platform with GPT-5.5-Cyber (85.6% on CyberGym), a Codex Security plugin for in-IDE vulnerability scanning, and Patch the Planet — an open-source initiative with Trail of Bits that has already fixed bugs in cURL, Go, Python, and 30+ more projects.

Jun 23, 2026
Codex Gets Chrome DevTools Access for Live Browser Debugging
AI Agents, News & Updates

Codex Gets Chrome DevTools Access for Live Browser Debugging

OpenAI's June 19 Codex update ships Developer mode for Chrome and the in-app browser, giving the coding agent Chrome DevTools Protocol access to profile JavaScript, inspect network traffic, and diagnose live browser issues.

Jun 22, 2026
Cursor Announces Origin: An Agent-First Git Forge to Rival GitHub
News & Updates, Industry Analysis, Code Editors

Cursor Announces Origin: An Agent-First Git Forge to Rival GitHub

Cursor unveiled Origin at its Compile conference — a Git hosting platform built for AI agents, not humans, targeting GitHub with 22.6 commits per second and a fall 2026 GA window.

Jun 19, 2026
← Scroll for more →