Alibaba launches Qwen3.7-Plus, a multimodal 'computer-use' agent

Alibaba positioned Qwen3.7-Plus as a 'computer-use' agent rather than just a chat model. The system fuses visual perception, GUI control and autonomous code generation inside a single agent loop, accepting text, images and video as input so it can read screens, navigate desktop and web applications, write code from visual templates, and call tools without a human in the loop. It is offered via API through Alibaba Cloud's Bailian platform.
The technical pitch centers on GUI grounding—mapping a model's understanding of an on-screen interface to precise actions like clicks and field entries—where Alibaba says Qwen3.7-Plus tops its own benchmarks. That places it in direct competition with computer-use agents from Anthropic (Claude's computer use), OpenAI's operator-style agents, and now Alibaba's domestic and Western rivals.
Strategically, a capable, API-accessible agent model strengthens Alibaba Cloud's enterprise offering and extends the Qwen family's reputation as one of the strongest open and commercial model lineups out of China. The caveat with computer-use agents broadly is reliability: autonomous GUI navigation remains error-prone on unfamiliar interfaces, and benchmark leadership on curated grounding tasks doesn't always translate to robust real-world automation. Pricing and rate limits via Bailian will shape adoption.