AI Agents, API Tools

OpenAI Launches WebSocket Mode for Responses API, Promising Up to 40% Faster Agentic Workflows

The new persistent-connection option targets developers building tool-heavy AI agents, with early adopters like Cursor and Cline reporting significant latency improvements.

2 min read
OpenAI Launches WebSocket Mode for Responses API, Promising Up to 40% Faster Agentic Workflows

Image by OpenAI

OpenAI has released a WebSocket mode for its Responses API, giving developers a persistent-connection alternative to standard HTTP streaming for AI agent workflows that involve heavy tool use.

The new mode, documented on OpenAI's developer platform, lets applications maintain a single open connection to /v1/responses and pass only incremental updates — new messages or tool outputs — between turns. The server holds conversation state in memory, eliminating the need to retransmit full context on every round trip.

For developers building agentic systems — coding assistants, orchestration loops, or multi-step automation — the difference is material. According to OpenAI's documentation, "For rollouts with 20+ tool calls, we have seen up to roughly 40% faster end-to-end execution."

How It Works

Developers open a WebSocket connection to wss://api.openai.com/v1/responses and send response.create events whose payloads mirror the existing Responses API body. Continuation between turns uses previous_response_id chaining, with only new input items sent on each subsequent call.

The server maintains one previous-response state per connection in an in-memory cache. Because this state is never written to disk, the mode is compatible with both store=false and Zero Data Retention (ZDR) configurations — a practical consideration for enterprise developers subject to data-handling restrictions.

A warmup feature allows developers to pre-load tools, instructions, and messages by sending response.create with generate: false, which prepares request state without producing model output. This can shave additional milliseconds off the first generated turn.

Constraints Developers Should Know

The mode comes with notable limitations. Connections are capped at 60 minutes, after which developers must reconnect. Only one response can be in-flight at a time per connection — there is no multiplexing. Developers needing parallel runs must open multiple connections.

If a turn fails with a 4xx or 5xx error, the server evicts the cached previous_response_id, preventing stale state reuse. For store=false sessions, losing the in-memory cache means the chain cannot be resumed — the client receives a previous_response_not_found error and must start fresh with full context.

Early Adoption Signals

According to reports on X (formerly Twitter), several prominent developer tools have already integrated the feature. "Tools like Cursor reported 30% speed gains for all users, Cline saw up to 50% on complex work, and Vercel's AI SDK integrated it seamlessly for quicker responses," the trending topic summary noted.

These numbers, if sustained at scale, represent a meaningful reduction in the wall-clock time users spend waiting during multi-step AI-assisted coding sessions — the exact use case where latency compounds across dozens of sequential tool calls.

What This Means for Developers

The WebSocket mode doesn't change what the Responses API can do — it changes how fast it can do it in specific scenarios. Developers whose applications make fewer than a handful of tool calls per session are unlikely to see dramatic improvements. But for teams building complex agents that chain 20 or more tool invocations, the reduction in per-turn overhead could meaningfully improve user experience.

The mode also introduces new operational complexity. Developers must handle reconnection logic, manage the 60-minute connection lifetime, and decide between server-side compaction (context_management) and the standalone /responses/compact endpoint for managing long context windows.

Code samples in OpenAI's documentation reference gpt-5.2 as the model, suggesting the feature is designed for current and next-generation model deployments.

Share:

Other Latest News

Cursor Hits $3B ARR as Composer 2.5 Trains on SpaceX Hardware
News & Updates, Code Editors

Cursor Hits $3B ARR as Composer 2.5 Trains on SpaceX Hardware

Bloomberg reveals Cursor crossed $3B in annualized revenue in late April, with 3,000+ enterprise customers paying $100K+ annually — and Composer 2.5 already drawing on SpaceX's Colossus data centers.

May 22, 2026
SpaceX Files S-1, Triggering the $60B Cursor Acquisition Clock
News & Updates, Industry Analysis, Code Editors

SpaceX Files S-1, Triggering the $60B Cursor Acquisition Clock

SpaceX's IPO prospectus, filed May 20, formally discloses the $60B Cursor acquisition terms and sets a ~July close timeline — raising hard questions about model neutrality, compute access, and developer data privacy.

May 22, 2026
OpenAI Reasoning Model Cracks 80-Year Math Problem, Signaling Codex Leap
AI Agents, News & Updates

OpenAI Reasoning Model Cracks 80-Year Math Problem, Signaling Codex Leap

An internal OpenAI general-purpose reasoning model disproved a famous Erdős conjecture open since 1946 — a first for autonomous AI in frontier mathematics, with direct implications for what is coming to Codex and agentic coding tools.

May 21, 2026
SpaceX IPO S-1 Locks In $60B Cursor Acquisition in Stock
News & Updates, Code Editors

SpaceX IPO S-1 Locks In $60B Cursor Acquisition in Stock

SpaceX's IPO prospectus reveals for the first time that the $60B Cursor acquisition will be paid in SPCX Class A stock — not cash — and that SpaceX has no formal obligation to close the deal.

May 21, 2026
Google Launches Gemini 3.5 Flash and Antigravity 2.0 at I/O
AI Agents, News & Updates, Code Editors

Google Launches Gemini 3.5 Flash and Antigravity 2.0 at I/O

Google unveiled Gemini 3.5 Flash and Antigravity 2.0 at I/O 2026 — a 4x-faster agentic model and a new agent-first coding IDE that puts Google in direct competition with Claude Code and OpenAI Codex.

May 21, 2026
Cursor Brings Cloud Agents to Jira With Native Work Item Integration
AI Agents, News & Updates, Code Editors

Cursor Brings Cloud Agents to Jira With Native Work Item Integration

Cursor now lets teams assign Jira tickets directly to a cloud agent or mention @Cursor in any comment to trigger a task — completing the loop between where work is tracked and where it gets done.

May 20, 2026
← Scroll for more →