Claude Sonnet 4.6 Released: Anthropic's Most Capable Sonnet Model Yet

Anthropic released Claude Sonnet 4.6 on Tuesday, calling it the company's "most capable Sonnet model yet" with significant upgrades across coding, computer use, and long-context reasoning capabilities.

The model is now the default for free and Pro plan users on claude.ai and Claude Cowork, maintaining the same pricing as its predecessor at $3 per million input tokens and $15 per million output tokens.

Developer Reception

Internal testing showed developers with early access preferred Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time. More notably, users preferred the new model over Claude Opus 4.5—Anthropic's frontier model from November—59% of the time.

Users reported that Sonnet 4.6 "more effectively read the context before modifying code and consolidated shared logic rather than duplicating it," according to Anthropic. Testing also showed the model was "significantly less prone to overengineering and 'laziness,' and meaningfully better at instruction following."

Computer Use Advances

The model shows marked improvement on OSWorld, the standard benchmark for AI computer use that tests tasks across real software like Chrome, LibreOffice, and VS Code. Anthropic noted that early users are "seeing human-level capability in tasks like navigating a complex spreadsheet or filling out a multi-step web form."

Pace, an AI-powered insurance company, reported that "Claude Sonnet 4.6 hit 94% on our insurance benchmark, making it the highest-performing model we've tested for computer use."

Benchmark Performance

The model achieved 79.6% on SWE-bench Verified, 89.9% on GPQA Diamond, and 58.3% on ARC-AGI-2. On Anthropic's benchmarks for agentic financial analysis and office tasks, Sonnet 4.6 outperformed competitors including Google's Gemini 3 Pro and OpenAI's GPT 5.2.

Replit noted that "the performance-to-cost ratio of Claude Sonnet 4.6 is extraordinary—it's hard to overstate how fast Claude models have been evolving in recent months."

Technical Features

Sonnet 4.6 includes a 1 million token context window in beta—sufficient to hold entire codebases or dozens of research papers in a single request. The model supports adaptive thinking, extended thinking, and context compaction, which automatically summarizes older context as conversations approach limits.

Anthropic's safety evaluations concluded that Sonnet 4.6 shows "a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment."

The company acknowledged the model "still lags behind the most skilled humans at using computers" but emphasized that the rate of progress suggests "substantially more capable models are within reach."