AI Agents, API Tools

OpenAI Launches WebSocket Mode for Responses API, Promising Up to 40% Faster Agentic Workflows

The new persistent-connection option targets developers building tool-heavy AI agents, with early adopters like Cursor and Cline reporting significant latency improvements.

CA
Author
CWA Team
February 24, 2026
OpenAI Launches WebSocket Mode for Responses API, Promising Up to 40% Faster Agentic Workflows

Image by OpenAI

OpenAI has released a WebSocket mode for its Responses API, giving developers a persistent-connection alternative to standard HTTP streaming for AI agent workflows that involve heavy tool use.

The new mode, documented on OpenAI's developer platform, lets applications maintain a single open connection to /v1/responses and pass only incremental updates — new messages or tool outputs — between turns. The server holds conversation state in memory, eliminating the need to retransmit full context on every round trip.

For developers building agentic systems — coding assistants, orchestration loops, or multi-step automation — the difference is material. According to OpenAI's documentation, "For rollouts with 20+ tool calls, we have seen up to roughly 40% faster end-to-end execution."

How It Works

Developers open a WebSocket connection to wss://api.openai.com/v1/responses and send response.create events whose payloads mirror the existing Responses API body. Continuation between turns uses previous_response_id chaining, with only new input items sent on each subsequent call.

The server maintains one previous-response state per connection in an in-memory cache. Because this state is never written to disk, the mode is compatible with both store=false and Zero Data Retention (ZDR) configurations — a practical consideration for enterprise developers subject to data-handling restrictions.

A warmup feature allows developers to pre-load tools, instructions, and messages by sending response.create with generate: false, which prepares request state without producing model output. This can shave additional milliseconds off the first generated turn.

Constraints Developers Should Know

The mode comes with notable limitations. Connections are capped at 60 minutes, after which developers must reconnect. Only one response can be in-flight at a time per connection — there is no multiplexing. Developers needing parallel runs must open multiple connections.

If a turn fails with a 4xx or 5xx error, the server evicts the cached previous_response_id, preventing stale state reuse. For store=false sessions, losing the in-memory cache means the chain cannot be resumed — the client receives a previous_response_not_found error and must start fresh with full context.

Early Adoption Signals

According to reports on X (formerly Twitter), several prominent developer tools have already integrated the feature. "Tools like Cursor reported 30% speed gains for all users, Cline saw up to 50% on complex work, and Vercel's AI SDK integrated it seamlessly for quicker responses," the trending topic summary noted.

These numbers, if sustained at scale, represent a meaningful reduction in the wall-clock time users spend waiting during multi-step AI-assisted coding sessions — the exact use case where latency compounds across dozens of sequential tool calls.

What This Means for Developers

The WebSocket mode doesn't change what the Responses API can do — it changes how fast it can do it in specific scenarios. Developers whose applications make fewer than a handful of tool calls per session are unlikely to see dramatic improvements. But for teams building complex agents that chain 20 or more tool invocations, the reduction in per-turn overhead could meaningfully improve user experience.

The mode also introduces new operational complexity. Developers must handle reconnection logic, manage the 60-minute connection lifetime, and decide between server-side compaction (context_management) and the standalone /responses/compact endpoint for managing long context windows.

Code samples in OpenAI's documentation reference gpt-5.2 as the model, suggesting the feature is designed for current and next-generation model deployments.

Share:

Other Latest News

Anthropic's Claude Code Security Rattles Cybersecurity Stocks — But What Does It Mean for Developers?

Anthropic's Claude Code Security Rattles Cybersecurity Stocks — But What Does It Mean for Developers?

Anthropic's new AI-powered code scanning tool triggered a sharp sell-off in cybersecurity stocks, but its real significance may lie in how it reshapes the developer workflow around vulnerability detection and patching.

Feb 22, 2026
Rork Launches 'Max,' an AI-Powered Swift Developer That Runs in the Browser
AI Agents, Code Editors

Rork Launches 'Max,' an AI-Powered Swift Developer That Runs in the Browser

The AI app-building platform now offers two distinct paths for mobile developers: React Native for cross-platform speed, and a new Swift-native option that promises full access to Apple's ecosystem without needing Xcode.

Feb 20, 2026
Google's Gemini 3.1 Pro Arrives With Major Reasoning Gains — Here's What It Means for Developers
AI Agents

Google's Gemini 3.1 Pro Arrives With Major Reasoning Gains — Here's What It Means for Developers

Google's latest model update more than doubles reasoning benchmark scores over its predecessor, shipping across AI Studio, Vertex AI, Gemini CLI, and the new Antigravity agentic platform.

Feb 19, 2026
Figma Launches Claude Code Integration, Bridging the Gap Between AI-Built Prototypes and Design Teams

Figma Launches Claude Code Integration, Bridging the Gap Between AI-Built Prototypes and Design Teams

A new workflow lets developers capture live UIs built with Claude Code and convert them into editable Figma frames, reshaping how design and engineering teams collaborate on AI-generated interfaces.

Feb 19, 2026
Contra Launches Commission-Free Payment Platform with Designer-Focused Tools
AI Agents

Contra Launches Commission-Free Payment Platform with Designer-Focused Tools

The freelance marketplace introduces Contra Payments, offering designers multiple revenue streams including digital product sales and subscription billing without platform commissions.

Feb 19, 2026
Anthropic Releases Claude Sonnet 4.6 with Major Coding and Computer Use Improvements
AI Agents, News & Updates

Anthropic Releases Claude Sonnet 4.6 with Major Coding and Computer Use Improvements

The upgraded Sonnet model approaches Opus-level performance at a lower price point, with developers preferring it over previous flagship models in early testing.

Feb 18, 2026
← Scroll for more →