AI Agents, API Tools

OpenAI Launches WebSocket Mode for Responses API, Promising Up to 40% Faster Agentic Workflows

The new persistent-connection option targets developers building tool-heavy AI agents, with early adopters like Cursor and Cline reporting significant latency improvements.

CA
Author
CWA Team
February 24, 2026
OpenAI Launches WebSocket Mode for Responses API, Promising Up to 40% Faster Agentic Workflows

Image by OpenAI

OpenAI has released a WebSocket mode for its Responses API, giving developers a persistent-connection alternative to standard HTTP streaming for AI agent workflows that involve heavy tool use.

The new mode, documented on OpenAI's developer platform, lets applications maintain a single open connection to /v1/responses and pass only incremental updates — new messages or tool outputs — between turns. The server holds conversation state in memory, eliminating the need to retransmit full context on every round trip.

For developers building agentic systems — coding assistants, orchestration loops, or multi-step automation — the difference is material. According to OpenAI's documentation, "For rollouts with 20+ tool calls, we have seen up to roughly 40% faster end-to-end execution."

How It Works

Developers open a WebSocket connection to wss://api.openai.com/v1/responses and send response.create events whose payloads mirror the existing Responses API body. Continuation between turns uses previous_response_id chaining, with only new input items sent on each subsequent call.

The server maintains one previous-response state per connection in an in-memory cache. Because this state is never written to disk, the mode is compatible with both store=false and Zero Data Retention (ZDR) configurations — a practical consideration for enterprise developers subject to data-handling restrictions.

A warmup feature allows developers to pre-load tools, instructions, and messages by sending response.create with generate: false, which prepares request state without producing model output. This can shave additional milliseconds off the first generated turn.

Constraints Developers Should Know

The mode comes with notable limitations. Connections are capped at 60 minutes, after which developers must reconnect. Only one response can be in-flight at a time per connection — there is no multiplexing. Developers needing parallel runs must open multiple connections.

If a turn fails with a 4xx or 5xx error, the server evicts the cached previous_response_id, preventing stale state reuse. For store=false sessions, losing the in-memory cache means the chain cannot be resumed — the client receives a previous_response_not_found error and must start fresh with full context.

Early Adoption Signals

According to reports on X (formerly Twitter), several prominent developer tools have already integrated the feature. "Tools like Cursor reported 30% speed gains for all users, Cline saw up to 50% on complex work, and Vercel's AI SDK integrated it seamlessly for quicker responses," the trending topic summary noted.

These numbers, if sustained at scale, represent a meaningful reduction in the wall-clock time users spend waiting during multi-step AI-assisted coding sessions — the exact use case where latency compounds across dozens of sequential tool calls.

What This Means for Developers

The WebSocket mode doesn't change what the Responses API can do — it changes how fast it can do it in specific scenarios. Developers whose applications make fewer than a handful of tool calls per session are unlikely to see dramatic improvements. But for teams building complex agents that chain 20 or more tool invocations, the reduction in per-turn overhead could meaningfully improve user experience.

The mode also introduces new operational complexity. Developers must handle reconnection logic, manage the 60-minute connection lifetime, and decide between server-side compaction (context_management) and the standalone /responses/compact endpoint for managing long context windows.

Code samples in OpenAI's documentation reference gpt-5.2 as the model, suggesting the feature is designed for current and next-generation model deployments.

Share:

Other Latest News

Claude Outage Exposes Developer Dependency Risks as Anthropic Grapples With Surge in Demand
AI Agents, News & Updates

Claude Outage Exposes Developer Dependency Risks as Anthropic Grapples With Surge in Demand

A widespread Monday morning outage affecting Claude.ai and Claude Code left developers unable to access key tools for over two hours, raising questions about reliability as Anthropic navigates unprecedented user growth.

Mar 2, 2026
OpenAI Strikes Pentagon Deal With Safety Guardrails as Anthropic Gets Blacklisted Over Same Concerns
Industry Analysis

OpenAI Strikes Pentagon Deal With Safety Guardrails as Anthropic Gets Blacklisted Over Same Concerns

OpenAI secured a classified network deployment agreement with the Department of Defense that includes prohibitions on mass surveillance and autonomous weapons — the same safety red lines that contributed to Anthropic's blacklisting hours earlier.

Feb 28, 2026
Block Cuts Over 4,000 Jobs as Jack Dorsey Bets on AI to Replace Developer Teams

Block Cuts Over 4,000 Jobs as Jack Dorsey Bets on AI to Replace Developer Teams

Jack Dorsey's payments company Block is slashing its workforce nearly in half, from over 10,000 to under 6,000, in one of the most aggressive AI-driven restructurings yet — with major implications for software developers across the industry.

Feb 27, 2026
Figma's OpenAI Codex Integration Blurs the Line Between Designer and Developer

Figma's OpenAI Codex Integration Blurs the Line Between Designer and Developer

A week after partnering with Anthropic's Claude Code, Figma has integrated OpenAI's Codex—signaling a rapid push to make design-to-code workflows seamless for a new generation of design engineers.

Feb 26, 2026
Anthropic's Cowork Brings Autonomous AI Task Execution to Non-Technical Users

Anthropic's Cowork Brings Autonomous AI Task Execution to Non-Technical Users

Anthropic launches Cowork, a research preview feature that lets Claude access local files and complete knowledge work tasks autonomously — a potentially significant shift for solo entrepreneurs and small teams who lack dedicated support staff.

Feb 26, 2026
Cursor Gives AI Agents Their Own Computers, Signaling a Shift in How Developers Work
AI Agents

Cursor Gives AI Agents Their Own Computers, Signaling a Shift in How Developers Work

Cursor's updated cloud agents can now operate in isolated virtual machines, test their own code, and produce video demos — with the company reporting that over 30% of its internal pull requests are now created by autonomous agents.

Feb 25, 2026
← Scroll for more →