News & Updates, AI Agents, API Tools

OpenAI Ships GPT-5.5: Agentic Coding Model Now Live in Codex

OpenAI launched GPT-5.5 on April 23 — its first fully retrained base model since GPT-4.5 — now live for paid ChatGPT and Codex users. With 82.7% on Terminal-Bench 2.0 and state-of-the-art performance on long-horizon coding tasks, it marks a new bar for agentic developer tooling.

4 min read
OpenAI Ships GPT-5.5: Agentic Coding Model Now Live in Codex

Image by OpenAI

OpenAI Ships GPT-5.5: The Agentic Model Designed to Run Your Dev Workflows

OpenAI launched GPT-5.5 on Thursday, releasing its most capable model to date across ChatGPT and Codex for paid subscribers. The release, coming just six weeks after GPT-5.4, is the first fully retrained base model since GPT-4.5 — and unlike its recent predecessors, GPT-5.5 is explicitly built for agentic work: the kind of multi-step, multi-tool workflows where an AI needs to plan, execute, check its own output, and keep going without being micromanaged.

"GPT-5.5 understands what you're trying to do faster and can carry more of the work itself," OpenAI said in the launch announcement. "Instead of carefully managing every step, you can give GPT-5.5 a messy, multi-part task and trust it to plan, use tools, check its work, navigate through ambiguity, and keep going."

The gains are concentrated in four areas OpenAI identified as requiring long-horizon reasoning: agentic coding, computer use, knowledge work, and early scientific research. For developers, that means fewer reprompts, fewer mid-task corrections, and more end-to-end task completion in a single pass.

Benchmark Numbers That Matter for Developers

The headline figure for developers is Terminal-Bench 2.0, which tests a model's ability to handle complex command-line workflows involving planning and iterative tool use. GPT-5.5 scores 82.7% — ahead of Claude Opus 4.7 at 69.4% and Gemini 3.1 Pro at 68.5%.

On SWE-Bench Pro, which measures real-world GitHub issue resolution across four programming languages, GPT-5.5 resolves 58.6% of tasks end-to-end in a single pass. Anthropic's Claude Opus 4.7 scores higher at 64.3%, though OpenAI has flagged that "labs reported signs of memorization on a subset of those problems" — a caveat worth tracking as independent evaluations surface.

OpenAI also reports results on Expert-SWE, an internal benchmark measuring tasks with a median estimated human completion time of 20 hours. GPT-5.5 outperforms GPT-5.4 on that benchmark, which is the more relevant signal for developers running large refactors or multi-session feature builds through Codex agents.

For harder mathematical reasoning, GPT-5.5 Pro scored 39.6% on FrontierMath Tier 4 — postdoctoral-level math problems — nearly double Claude Opus 4.7's 22.9%. OpenAI says a customized version of the model also assisted researchers in discovering a new mathematical proof related to Ramsey numbers.

What Changes Inside Codex

The most immediately relevant change for developers building with Codex is efficiency. OpenAI says GPT-5.5 delivers better results with fewer tokens than GPT-5.4 for most users — and despite being a more capable model, it matches GPT-5.4's per-token latency in real-world serving. Bigger, more capable models are usually slower, so this is a notable engineering result.

OpenAI said 4 million developers are now actively using Codex every week, up from 3 million just two weeks before the announcement. The company also disclosed that 9 million businesses are paying for ChatGPT and that GPT-5.5 has already been put to internal use: one team used it in Codex to analyze six months of data, build a scoring framework, and validate an automated Slack agent, while another used it to review 24,771 K-1 tax forms spanning 71,637 pages.

The model can also automatically figure out how to use an MCP server without the user providing explicit instructions — a concrete improvement over GPT-5.4 for developers who have built tool-use integrations and want agents to navigate them without hand-holding.

Availability and Pricing

As of April 23, GPT-5.5 is live for Plus, Pro, Business, and Enterprise users in ChatGPT and Codex. GPT-5.5 Pro — a more capable, higher-accuracy version — is rolling out to Pro, Business, and Enterprise users in ChatGPT.

API access is not yet live at launch, with OpenAI saying it is coming "very soon" and that API deployments require additional safety safeguards that are still being finalized with partners. When the API launches, pricing will be $5 per million input tokens and $30 per million output tokens — double the cost of GPT-5.4. GPT-5.5 Pro is priced at $30 per million input tokens and $180 per million output tokens.

OpenAI says the higher price is offset by token efficiency gains, framing it as a net wash or better for most Codex workloads. That claim is worth testing against your own use patterns before assuming it holds at scale.

Safety Posture

OpenAI classified GPT-5.5 as meeting a "High" cybersecurity risk threshold — meaning it could amplify existing threats if misused — but said it does not cross the "Critical" threshold associated with severe harm. The company conducted extensive third-party red-teaming and safeguard testing before release, and worked with nearly 200 trusted early-access partners across real use cases. The safety posture comes as AI cybersecurity capabilities have been under scrutiny across the industry following Anthropic's limited rollout of its more powerful Claude Mythos Preview model.

What's Unconfirmed

OpenAI has not disclosed a specific timeline for API availability beyond "very soon," nor published detailed system cards covering the full scope of GPT-5.5's capabilities and failure modes. Independent third-party evaluation of the SWE-Bench Pro and Terminal-Bench 2.0 scores — particularly the memorization caveat raised about Anthropic's competing results — has not yet been completed. Developers should treat the benchmark comparisons as directional signals rather than settled fact until external evaluations replicate them.

Share:

Other Latest News

Windsurf Becomes Devin Desktop With New Agent Command Center
AI Agents, News & Updates, Code Editors

Windsurf Becomes Devin Desktop With New Agent Command Center

Cognition has relaunched Windsurf as Devin Desktop, a full IDE with an Agent Command Center and open ACP protocol support that lets Codex, Claude Agent, OpenCode, and custom in-house agents run side by side in a shared Kanban workspace.

Jun 3, 2026
OpenAI Codex Lands on Amazon Bedrock for 5 Million Weekly Users
AI Agents, News & Updates, Code Editors

OpenAI Codex Lands on Amazon Bedrock for 5 Million Weekly Users

OpenAI's AI coding agent Codex is now generally available on Amazon Bedrock, giving developers at AWS-native organizations direct access to Codex through existing cloud infrastructure, procurement, and compliance workflows.

Jun 3, 2026
Anthropic Ships Claude Opus 4.8 With Dynamic Workflows and 3x Cheaper Fast Mode
AI Agents, News & Updates, Code Editors

Anthropic Ships Claude Opus 4.8 With Dynamic Workflows and 3x Cheaper Fast Mode

Anthropic releases Claude Opus 4.8 with improved agentic coding benchmarks, a 3x cheaper Fast Mode, and Dynamic Workflows that run hundreds of parallel subagents inside Claude Code.

May 29, 2026
MiniMax Revenue Doubles as M3 Promises 15x Long-Context Speed Boost
News & Updates, AI Assistants, Industry Analysis

MiniMax Revenue Doubles as M3 Promises 15x Long-Context Speed Boost

MiniMax annualized revenue doubled in two months, crossing 1M enterprise users. The company just teased M3's sparse attention architecture claiming 15.6x faster decoding at 1M-token contexts.

May 28, 2026
Anthropic Ships Free Claude Code Security Plugin to Catch Vulnerabilities in Real Time
AI Agents, News & Updates, Code Editors

Anthropic Ships Free Claude Code Security Plugin to Catch Vulnerabilities in Real Time

Anthropic's new Security Guidance plugin for Claude Code automatically scans code changes for injection flaws, unsafe deserialization, and 25+ dangerous patterns before they reach pull requests — free for all users on every plan.

May 27, 2026
Anthropic's Mythos Model Surfaces in Claude Code Ahead of Wider Release
AI Agents, News & Updates, Code Editors

Anthropic's Mythos Model Surfaces in Claude Code Ahead of Wider Release

Anthropic's most powerful restricted model, Claude Mythos 1, briefly appeared in Claude Code and Claude Security interfaces on May 25 — signaling the first commercial rollout of the model that found over 10,000 zero-day vulnerabilities under Project Glasswing.

May 26, 2026
← Scroll for more →