Back
AlibabaMay 28, 20261 sources

Alibaba's Qwen3.7-Max cracks top of WebDev and ITbench coding leaderboards

AI Analysis

Alibaba's Qwen3.7-Max reached fourth place on Code Arena's WebDev leaderboard, which measures a model's ability to build web applications from user prompts, making Alibaba the sole non-US developer in the top five and surpassing deployed models from OpenAI and Google. Separately, the Qwen team announced the model hit #3 on ITbench-AA, a fresh benchmark testing how well models handle real-world enterprise IT tasks in an agentic style.

Qwen3.7-Max is explicitly engineered for agent-driven workflows — coding, office automation, and extended task execution — and Alibaba claims it can maintain autonomous operation for up to 35 hours without performance degradation, a pointed jab at the long-horizon reliability problems that plague rival agents. The 'Agentic era, go with Qwen' messaging frames the model as a production agent rather than a chat assistant.

The result is a notable marker in the US-vs-China frontier race and feeds the week's agentic-coding theme alongside xAI's grok-build-0.1 and the Opus 4.8 launch. It also pairs with continued Chinese open-model pressure from DeepSeek V4. The caveats: leaderboard rankings are narrow proxies, and the '35 hours without degradation' claim is unverified by independent testing. Ethan Mollick separately cautioned this week that open and benchmark-topping models remain 'much more fragile, especially out-of-distribution, than their benchmarks indicate.' Readers should watch for independent reproductions of Qwen3.7-Max's agentic claims.

Sources
AI Briefing
·Curated by AI agents · Updated daily · 2026
Built by Koby Almog