Claude Opus 4.7 hits 80.5% on SWE-bench and leads GPT-5.4 on GDPVal but eats 1.35x more tokens than Opus 4.6

decrypt.co •

Revision history

2 recorded changes

Want your article here?

Anthropic shipped Claude Opus 4.7 with solid benchmark gains — 80.5% on SWE-bench Multilingual (up from 77.8%), 80.6% on OfficeQA Pro vs. GPT-5.4's 51.1% and Gemini 3.1 Pro's 42.9%, plus a 1,753 GDPVal Elo that clears GPT-5.4's 1,674. The catch: a new tokenizer inflates usage 1.0-1.35x and the model autonomously rewrites code, so Decrypt's reviewer burned through a full session quota on one coding task. Pricing holds at $5/$25 per million input/output tokens on Claude.ai, API, Bedrock, Vertex AI, and Microsoft Foundry.

TLDR by @Benthic

Claude Opus 4.7 hits 80.5% on SWE-bench and leads GPT-5.4 on GDPVal but eats 1.35x more tokens than Opus 4.6

More on GPT

OpenAI launches GPT-Rosalind for life sciences and drug discovery, with Amgen, Moderna, and Allen Institute on early access

OpenAI expands cybersecurity push with GPT-5.4-Cyber access for US and UK agencies, partnering with banks and tech giants to strengthen global AI defense systems

ZetaChain 2.0 powers Anuma, the first AI app enabling users to import and encrypt their entire chat history while preserving private memory across models like GPT and Claude.

OpenAI unveils GPT-5.4 with reasoning and agentic upgrades; users can now interrupt and redirect model mid-response

2025 wasn’t about one “best” LLM—it was about stacks. Users mixed models by task: Claude for coding, GPT-5.x for agents/chat, DeepSeek/Qwen for cheap scale, Gemini for research, and niche tools for creativity.

OpenAI releases GPT Image 1.5

OpenAI launches GPT-Rosalind for life sciences and drug discovery, with Amgen, Moderna, and Allen Institute on early access

OpenAI expands cybersecurity push with GPT-5.4-Cyber access for US and UK agencies, partnering with banks and tech giants to strengthen global AI defense systems

ZetaChain 2.0 powers Anuma, the first AI app enabling users to import and encrypt their entire chat history while preserving private memory across models like GPT and Claude.

OpenAI unveils GPT-5.4 with reasoning and agentic upgrades; users can now interrupt and redirect model mid-response

2025 wasn’t about one “best” LLM—it was about stacks. Users mixed models by task: Claude for coding, GPT-5.x for agents/chat, DeepSeek/Qwen for cheap scale, Gemini for research, and niche tools for creativity.

OpenAI releases GPT Image 1.5

Total stats

How to Earn

Claude Opus 4.7 hits 80.5% on SWE-bench and leads GPT-5.4 on GDPVal but eats 1.35x more tokens than Opus 4.6

More on GPT

OpenAI launches GPT-Rosalind for life sciences and drug discovery, with Amgen, Moderna, and Allen Institute on early access

OpenAI expands cybersecurity push with GPT-5.4-Cyber access for US and UK agencies, partnering with banks and tech giants to strengthen global AI defense systems

ZetaChain 2.0 powers Anuma, the first AI app enabling users to import and encrypt their entire chat history while preserving private memory across models like GPT and Claude.

OpenAI unveils GPT-5.4 with reasoning and agentic upgrades; users can now interrupt and redirect model mid-response

2025 wasn’t about one “best” LLM—it was about stacks. Users mixed models by task: Claude for coding, GPT-5.x for agents/chat, DeepSeek/Qwen for cheap scale, Gemini for research, and niche tools for creativity.

OpenAI releases GPT Image 1.5

OpenAI launches GPT-Rosalind for life sciences and drug discovery, with Amgen, Moderna, and Allen Institute on early access

OpenAI expands cybersecurity push with GPT-5.4-Cyber access for US and UK agencies, partnering with banks and tech giants to strengthen global AI defense systems

ZetaChain 2.0 powers Anuma, the first AI app enabling users to import and encrypt their entire chat history while preserving private memory across models like GPT and Claude.

OpenAI unveils GPT-5.4 with reasoning and agentic upgrades; users can now interrupt and redirect model mid-response

2025 wasn’t about one “best” LLM—it was about stacks. Users mixed models by task: Claude for coding, GPT-5.x for agents/chat, DeepSeek/Qwen for cheap scale, Gemini for research, and niche tools for creativity.

OpenAI releases GPT Image 1.5

Comments

Total stats

How to Earn