AnthropicJune 1, 20261 sources

Anthropic ships Claude Opus 4.8, claims 4x better honesty over GPT-5.5

AI Analysis

Claude Opus 4.8 arrived May 28, an unusually fast 41-day cadence after Opus 4.7, and Anthropic is positioning it on two axes: stronger performance on dynamic, multi-step agentic workflows and a claimed 4x improvement in honesty relative to OpenAI's GPT-5.5. The model is already its most capable public release and the engine behind the Claude Code revenue surge fueling the company's $965B valuation story.

On independent evaluation, François Chollet's ARC Prize team noted Opus 4.8 set a new state-of-the-art on ARC-AGI-3 — but the headline number is humbling: a 1.5% score at roughly $10K of compute. The result captures both the progress (it tops all prior models) and the ceiling (the benchmark is designed to resist memorization and remains brutally hard). The ARC team observed the model 'read the environment' before acting, a signal of more deliberate planning.

Competitively, the honesty framing is a direct jab at GPT-5.5, the rival OpenAI model that just went GA on AWS Bedrock the same week. Anthropic is increasingly selling Claude on reliability and reduced sycophancy for enterprise workloads rather than raw capability alone.

The community reaction was mixed. An r/ClaudeAI thread titled 'Opus "let me push back on that" 4.8' (172 upvotes) captured both appreciation and irritation at the model's more assertive disagreement style. Skeptics tie the 41-day cadence and the chilly reception to Opus 4.7 into a broader question about whether rapid point-releases are substantive upgrades or valuation-supporting marketing. Watch how 4.8 fares on real-world coding leaderboards versus its honesty claims.

Sources

opentools.ai

https://opentools.ai/news/claude-opus-4-8-dynamic-workflows-benchmarks-2026