OpenAI and Broadcom Unveil Jalapeño Inference Chip for Codex

OpenAI and Broadcom Unveil Jalapeño, a Custom Chip Built for Codex

OpenAI on Wednesday unveiled Jalapeño, its first custom-built inference accelerator, co-developed with Broadcom and produced by Celestica. The chip is purpose-built for LLM inference — the process of generating responses for live users — rather than model training, and represents OpenAI's most concrete step yet toward owning its own silicon stack.

Engineering samples of Jalapeño are already running ML workloads in the lab, including GPT-5.3-Codex-Spark. OpenAI says early testing shows substantially better performance-per-watt than current state-of-the-art alternatives, though no specific numbers have been disclosed. The company is targeting initial deployment by the end of 2026.

"Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers," said Richard Ho, who leads OpenAI's hardware program. "We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models."

Why Inference Hardware Matters for Developers

Training gets the press, but inference is where AI costs developers money and where latency determines user experience. Every ChatGPT response, every Codex agent task, every API call is inference. General-purpose GPUs — designed for the full matrix of AI workloads — are expensive and not always optimally matched to the specific patterns of transformer inference at scale.

Jalapeño is an ASIC: a fixed-function chip optimized for a narrow workload. That makes it less flexible than an Nvidia GPU but allows the architecture to be tuned precisely for the memory movement, attention computation, and networking patterns that dominate OpenAI's inference systems. The practical implication, if early performance holds: lower cost-per-token for Codex and ChatGPT API calls, which would eventually flow through to pricing and rate limits.

OpenAI's own AI models assisted in the chip's development — a detail that underscores how deeply software and hardware teams have integrated. The chip reached tape-out in just nine months, which Broadcom CEO Hock Tan described as potentially the fastest ASIC development cycle ever achieved in high-performance semiconductors. The same models served to users are now helping improve the infrastructure used to run future models.

The Infrastructure Play Behind the Chip

Jalapeño is the first chip in a planned multi-generation compute platform. OpenAI and Broadcom are targeting gigawatt-scale data center deployments with Microsoft and other partners, beginning before the end of 2026 and expanding in subsequent years. Broadcom's Tomahawk networking silicon will interconnect Jalapeño-powered racks, and Celestica handles board, rack, and system production.

The strategic rationale is clear. Google has TPUs, Amazon has Trainium, and Microsoft has Azure Maia. OpenAI has been one of Nvidia's largest GPU customers, which creates a structural cost disadvantage as inference volume grows. Every inference task that Jalapeño handles more efficiently than a GPU improves OpenAI's unit economics — critical for a company still burning capital ahead of a planned IPO.

"By designing more of the stack ourselves, we can serve more intelligence with greater efficiency and keep pushing advanced AI toward broader access," said OpenAI President Greg Brockman.

What Developers Should Watch

For developers building on the OpenAI API today, Jalapeño changes nothing immediately. The chip is in engineering sample phase and initial deployment will focus on controlled internal workloads. OpenAI has not announced API pricing changes tied to the chip launch.

The longer-term signal matters more. If Jalapeño delivers on its performance-per-watt claims, the economics of running Codex agent tasks and GPT API calls improve structurally — potentially translating to lower API costs or higher rate limits over time. OpenAI also described Jalapeño as designed with flexibility to run "all LLMs guided by OpenAI's insights," which leaves open the possibility of third-party access to Jalapeño infrastructure in future generations, though that remains speculative.

What's Unconfirmed

OpenAI has shared no independent benchmark data for Jalapeño. Performance claims come from internal testing only; specific numbers on throughput, latency, and cost per token have not been disclosed. Tom's Hardware notes the die image does not reveal enough architectural detail to independently verify design claims. The end-of-2026 deployment target refers to initial deployment, not full production rollout, and whether first deployments will serve internal workloads only or customer-facing API traffic has not been specified.