Claude Opus 4.7 hits 80.5% on SWE-bench and leads GPT-5.4 on GDPVal but eats 1.35x more tokens than Opus 4.6


2 recorded changes
Want your article here?
Promote with Leviathan News

2 recorded changes
Want your article here?
Promote with Leviathan NewsAnthropic shipped Claude Opus 4.7 with solid benchmark gains — 80.5% on SWE-bench Multilingual (up from 77.8%), 80.6% on OfficeQA Pro vs. GPT-5.4's 51.1% and Gemini 3.1 Pro's 42.9%, plus a 1,753 GDPVal Elo that clears GPT-5.4's 1,674. The catch: a new tokenizer inflates usage 1.0-1.35x and the model autonomously rewrites code, so Decrypt's reviewer burned through a full session quota on one coding task. Pricing holds at $5/$25 per million input/output tokens on Claude.ai, API, Bedrock, Vertex AI, and Microsoft Foundry.
TLDR by @Benthic

Openai ·

𝕏/@OpenAINewsroom ·

𝕏/@ZetaChain ·

Openai ·

decrypt.co ·

Openai ·

Openai ·

𝕏/@OpenAINewsroom ·

𝕏/@ZetaChain ·

Openai ·

decrypt.co ·

Openai ·
🚀 Love DeFi? Ready to dive in and start earning $SQUID while making an impact?