AI Agents, News & Updates

Anthropic Releases Claude Sonnet 4.6 with Major Coding and Computer Use Improvements

The upgraded Sonnet model approaches Opus-level performance at a lower price point, with developers preferring it over previous flagship models in early testing.

2 min read
Anthropic Releases Claude Sonnet 4.6 with Major Coding and Computer Use Improvements

Image by Anthropic

Anthropic released Claude Sonnet 4.6 on Tuesday, calling it the company's "most capable Sonnet model yet" with significant upgrades across coding, computer use, and long-context reasoning capabilities.

The model is now the default for free and Pro plan users on claude.ai and Claude Cowork, maintaining the same pricing as its predecessor at $3 per million input tokens and $15 per million output tokens.

Developer Reception

Internal testing showed developers with early access preferred Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time. More notably, users preferred the new model over Claude Opus 4.5—Anthropic's frontier model from November—59% of the time.

Users reported that Sonnet 4.6 "more effectively read the context before modifying code and consolidated shared logic rather than duplicating it," according to Anthropic. Testing also showed the model was "significantly less prone to overengineering and 'laziness,' and meaningfully better at instruction following."

Computer Use Advances

The model shows marked improvement on OSWorld, the standard benchmark for AI computer use that tests tasks across real software like Chrome, LibreOffice, and VS Code. Anthropic noted that early users are "seeing human-level capability in tasks like navigating a complex spreadsheet or filling out a multi-step web form."

Pace, an AI-powered insurance company, reported that "Claude Sonnet 4.6 hit 94% on our insurance benchmark, making it the highest-performing model we've tested for computer use."

Benchmark Performance

The model achieved 79.6% on SWE-bench Verified, 89.9% on GPQA Diamond, and 58.3% on ARC-AGI-2. On Anthropic's benchmarks for agentic financial analysis and office tasks, Sonnet 4.6 outperformed competitors including Google's Gemini 3 Pro and OpenAI's GPT 5.2.

Replit noted that "the performance-to-cost ratio of Claude Sonnet 4.6 is extraordinary—it's hard to overstate how fast Claude models have been evolving in recent months."

Technical Features

Sonnet 4.6 includes a 1 million token context window in beta—sufficient to hold entire codebases or dozens of research papers in a single request. The model supports adaptive thinking, extended thinking, and context compaction, which automatically summarizes older context as conversations approach limits.

Anthropic's safety evaluations concluded that Sonnet 4.6 shows "a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment."

The company acknowledged the model "still lags behind the most skilled humans at using computers" but emphasized that the rate of progress suggests "substantially more capable models are within reach."

Share:

Other Latest News

OpenAI Merges ChatGPT, Codex, and API Into One Agentic Platform
AI Agents, News & Updates

OpenAI Merges ChatGPT, Codex, and API Into One Agentic Platform

OpenAI is collapsing ChatGPT, Codex, and its developer API into a single product team under Greg Brockman, with Codex chief Thibault Sottiaux leading a unified super app ahead of a potential IPO.

May 18, 2026
Vercel Labs Ships Zero: A Programming Language Built for AI Agents
AI Agents, News & Updates

Vercel Labs Ships Zero: A Programming Language Built for AI Agents

Vercel Labs has open-sourced Zero, an experimental systems language with JSON diagnostics and typed repair metadata — the first language designed from the ground up for AI coding agent workflows.

May 18, 2026
Replit Returns to iPhone After Apple Dispute, Ships Agent 4 on iOS
AI Agents, News & Updates, Mobile Builders

Replit Returns to iPhone After Apple Dispute, Ships Agent 4 on iOS

Replit ships its first iOS update in four months after resolving an Apple App Store dispute over AI-generated code previews, bringing Agent 4 with parallel agents and team merge flows to iPhone.

May 17, 2026
xAI Launches Grok Build: A Coding Agent to Rival Claude Code
AI Agents, News & Updates

xAI Launches Grok Build: A Coding Agent to Rival Claude Code

xAI has launched Grok Build, a terminal-native agentic coding CLI powered by Grok 4.3 with a 2M token context window — xAI's first direct bid to compete with Claude Code and OpenAI Codex.

May 15, 2026
OpenAI Codex Goes Mobile With Full Remote Dev Control
AI Agents, News & Updates

OpenAI Codex Goes Mobile With Full Remote Dev Control

OpenAI has integrated Codex into the ChatGPT app for iOS and Android, letting developers approve commands, switch models, and monitor live coding environments from their phone.

May 15, 2026
Anthropic Brings Back Third-Party Agents on Claude With Monthly SDK Credits
AI Agents, News & Updates

Anthropic Brings Back Third-Party Agents on Claude With Monthly SDK Credits

Anthropic has reversed its April ban on third-party agent tools, introducing new monthly "Agent SDK" credits on all paid Claude plans—but the move ends the era of subsidized agentic compute on flat subscriptions, effective June 15.

May 14, 2026
← Scroll for more →