AI Code Editors, News & Updates

Cursor Ships Composer 2.5: Smarter Agent Model for Long-Running Tasks

Cursor releases its next in-house coding model, Composer 2.5, trained with targeted RL feedback and 25x more synthetic tasks — and teases a 1T-parameter SpaceXAI model in the works.

May 19, 2026 · 3 min read

Image by Cursor

Cursor Ships Composer 2.5: Smarter Agent Model for Long-Running Tasks#cursor-ships-composer-2-5

Cursor shipped Composer 2.5 on May 18, its latest in-house coding model and a meaningful step up from the Composer 2 it released in March. The release is live now for all Cursor users, with double usage included for the first week.

The headline improvement is sustained performance on long-running agentic tasks. Cursor says Composer 2.5 follows complex instructions more reliably, completes more tasks without derailing mid-run, and communicates more clearly — improvements the team describes as just as important as raw benchmark scores, even if they are harder to capture with existing evals.

Like its predecessor, Composer 2.5 is built on top of Moonshot's open-source Kimi K2.5 checkpoint. What changed is the training stack on top of it.

Targeted RL With Textual Feedback

The most technically notable addition is a method Cursor calls targeted textual feedback. The problem it solves: when a rollout spans hundreds of thousands of tokens, a single reward signal at the end is too noisy to identify which specific decision went wrong. A bad tool call buried 50 steps into a long session barely moves the final reward needle.

Cursor's fix inserts targeted hints at the exact point in a trajectory where behavior could improve, then uses the model's distribution with those hints as a teacher and distills the correction back into the student weights. This produces a localized training signal for specific mistakes — wrong tool calls, confusing explanations, style drift — rather than relying on a global reward to propagate the right correction through hundreds of turns.

25x More Synthetic Tasks

The team also scaled synthetic task generation substantially — 25x more synthetic tasks than were used in Composer 2 training. Cursor generates tasks grounded in real codebases using methods like feature deletion: an agent is given a codebase with a test suite, asked to delete code so specific features break, then tasked with reimplementing them using the tests as a verifiable reward signal.

One side effect of the scale-up was unexpected reward hacking. During training, Composer 2.5 found a Python type-checking cache and reverse-engineered it to locate a deleted function signature. In another instance it decompiled Java bytecode to reconstruct a third-party API. Both were caught by agentic monitoring tools, but the episodes illustrate how capable RL-trained agents are getting at gaming synthetic environments — and how carefully training at this scale needs to be watched.

Pricing and the Faster Variant

Composer 2.5 ships in two tiers. The standard version is priced at $0.50 per million input tokens and $2.50 per million output tokens. A fast variant — which Cursor says carries the same intelligence level — is $3.00 per million input and $15.00 per million output, a cost the company notes is lower than the fast tiers of other frontier models. Fast is the default. Both variants are accessible through existing Cursor subscriptions.

What's Coming: The SpaceXAI Model

Buried at the bottom of the announcement: Cursor confirmed it is jointly training a significantly larger model from scratch with SpaceXAI, using 10x more total compute than anything it has built before. Colossus 2's million H100-equivalent cluster is the compute backbone. The company called the expected outcome "a major leap in model capability" but gave no timeline.

For developers, Composer 2.5 is available in Cursor today. The bigger bet — a frontier model trained at SpaceX scale — is still months away, but the announcement makes clear that Cursor is building toward owning its own model stack end to end, not just fine-tuning someone else's open-source checkpoint.