Skip to main content

GLM-5.1

Overview

GLM-5.1 is an open-source flagship AI model developed by Z.ai, formerly Zhipu AI, a Tsinghua University spinoff and the first publicly traded foundation model company. Released on April 7, 2026, it is a post-training upgrade to GLM-5, built on a 754-billion-parameter Mixture-of-Experts architecture with 40 billion active parameters per token. GLM-5.1 is designed for agentic engineering and long-horizon autonomous software development.

Key Features

  • Agentic Coding: Achieves 58.4% on SWE-Bench Pro, ranking #1 as of April 2026, ahead of GPT-5.4 and Claude Opus 4.6.
  • 8-Hour Autonomous Execution: Can work autonomously on a single task for up to 8 hours, completing full plan-execute-test-fix-optimize loops across hundreds of iterations and thousands of tool calls.
  • MIT Licensed Open Weights: Released on Hugging Face under the MIT license, allowing unrestricted commercial use, modification, and fine-tuning.
  • Ascend-Native Training: Trained entirely on Huawei Ascend 910B chips using the MindSpore framework, enabling full independence from U.S.-manufactured hardware.

Best Use Cases

  • Long-Horizon Software Engineering: Well suited for complex, multi-step coding tasks that require sustained autonomous execution, such as building or maintaining large systems over long task cycles.
  • Agentic Tool Orchestration: Strong function calling, MCP integration, and structured output support make it effective for AI agents that need to interact with external tools and APIs.
  • Cost-Effective Frontier Performance: Delivers strong coding performance at a significantly lower cost than many proprietary frontier models.

Capabilities and Limitations

CapabilityDescription
ReasoningAIME 2026: 95.3%, GPQA-Diamond: 86.2%, with strong system-level reasoning across planning and iterative debugging
CodingSWE-Bench Pro 58.4%, CyberGym 68.7%, BrowseComp 68.0%, MCP-Atlas 71.8%
MultimodalText only. No image, audio, or video input. A separate GLM-5V-Turbo variant is available for vision tasks
Response SpeedNot independently benchmarked yet; expected to be comparable to similar-scale MoE models
Context Window200K tokens
Max Output128K tokens
Tool UseFunction calling, structured output, context caching, MCP integration, and thinking mode
MultilingualStrong multilingual support, particularly in Chinese and English

Known Limitations

  • Text-only input with no native multimodal support. Vision use cases rely on the separate GLM-5V-Turbo model.
  • Math and science benchmark scores trail some top proprietary models, making it less suitable for purely quantitative research tasks.
  • On broader coding composites such as Terminal-Bench 2.0 plus NL2Repo, Claude Opus 4.6 still leads.
  • Self-hosting requires substantial compute resources because of the 754B parameter count.

Credits Usage

ModelInput (Credits/Token)Cache Write (Credits/Token)Cache Read (Credits/Token)Output (Credits/Token)Web Search (Credits/Use)Billing Notes
GLM-5.11.401.400.264.40--