OpenAI Ships GPT-5.5: Agentic Coding Model Now Live in Codex

OpenAI Ships GPT-5.5: The Agentic Model Designed to Run Your Dev Workflows

OpenAI launched GPT-5.5 on Thursday, releasing its most capable model to date across ChatGPT and Codex for paid subscribers. The release, coming just six weeks after GPT-5.4, is the first fully retrained base model since GPT-4.5 — and unlike its recent predecessors, GPT-5.5 is explicitly built for agentic work: the kind of multi-step, multi-tool workflows where an AI needs to plan, execute, check its own output, and keep going without being micromanaged.

"GPT-5.5 understands what you're trying to do faster and can carry more of the work itself," OpenAI said in the launch announcement. "Instead of carefully managing every step, you can give GPT-5.5 a messy, multi-part task and trust it to plan, use tools, check its work, navigate through ambiguity, and keep going."

The gains are concentrated in four areas OpenAI identified as requiring long-horizon reasoning: agentic coding, computer use, knowledge work, and early scientific research. For developers, that means fewer reprompts, fewer mid-task corrections, and more end-to-end task completion in a single pass.

Benchmark Numbers That Matter for Developers

The headline figure for developers is Terminal-Bench 2.0, which tests a model's ability to handle complex command-line workflows involving planning and iterative tool use. GPT-5.5 scores 82.7% — ahead of Claude Opus 4.7 at 69.4% and Gemini 3.1 Pro at 68.5%.

On SWE-Bench Pro, which measures real-world GitHub issue resolution across four programming languages, GPT-5.5 resolves 58.6% of tasks end-to-end in a single pass. Anthropic's Claude Opus 4.7 scores higher at 64.3%, though OpenAI has flagged that "labs reported signs of memorization on a subset of those problems" — a caveat worth tracking as independent evaluations surface.

OpenAI also reports results on Expert-SWE, an internal benchmark measuring tasks with a median estimated human completion time of 20 hours. GPT-5.5 outperforms GPT-5.4 on that benchmark, which is the more relevant signal for developers running large refactors or multi-session feature builds through Codex agents.

For harder mathematical reasoning, GPT-5.5 Pro scored 39.6% on FrontierMath Tier 4 — postdoctoral-level math problems — nearly double Claude Opus 4.7's 22.9%. OpenAI says a customized version of the model also assisted researchers in discovering a new mathematical proof related to Ramsey numbers.

What Changes Inside Codex

The most immediately relevant change for developers building with Codex is efficiency. OpenAI says GPT-5.5 delivers better results with fewer tokens than GPT-5.4 for most users — and despite being a more capable model, it matches GPT-5.4's per-token latency in real-world serving. Bigger, more capable models are usually slower, so this is a notable engineering result.

OpenAI said 4 million developers are now actively using Codex every week, up from 3 million just two weeks before the announcement. The company also disclosed that 9 million businesses are paying for ChatGPT and that GPT-5.5 has already been put to internal use: one team used it in Codex to analyze six months of data, build a scoring framework, and validate an automated Slack agent, while another used it to review 24,771 K-1 tax forms spanning 71,637 pages.

The model can also automatically figure out how to use an MCP server without the user providing explicit instructions — a concrete improvement over GPT-5.4 for developers who have built tool-use integrations and want agents to navigate them without hand-holding.

Availability and Pricing

As of April 23, GPT-5.5 is live for Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. GPT-5.5 Pro — a more capable, higher-accuracy version — is rolling out to Pro, Business, and Enterprise users in ChatGPT.

API access is not yet live at launch, with OpenAI saying it is coming "very soon" and that API deployments require additional safety safeguards that are still being finalized with partners. When the API launches, pricing will be $5 per million input tokens and $30 per million output tokens — double the cost of GPT-5.4. GPT-5.5 Pro is priced at $30 per million input tokens and $180 per million output tokens.

OpenAI says the higher price is offset by token efficiency gains, framing it as a net wash or better for most Codex workloads. That claim is worth testing against your own use patterns before assuming it holds at scale.

Safety Posture

OpenAI classified GPT-5.5 as meeting a "High" cybersecurity risk threshold — meaning it could amplify existing threats if misused — but said it does not cross the "Critical" threshold associated with severe harm. The company conducted extensive third-party red-teaming and safeguard testing before release, and worked with nearly 200 trusted early-access partners across real use cases. The safety posture comes as AI cybersecurity capabilities have been under scrutiny across the industry following Anthropic's limited rollout of its more powerful Claude Mythos Preview model.

What's Unconfirmed

OpenAI has not disclosed a specific timeline for API availability beyond "very soon," nor published detailed system cards covering the full scope of GPT-5.5's capabilities and failure modes. Independent third-party evaluation of the SWE-Bench Pro and Terminal-Bench 2.0 scores — particularly the memorization caveat raised about Anthropic's competing results — has not yet been completed. Developers should treat the benchmark comparisons as directional signals rather than settled fact until external evaluations replicate them.