GLM-5

GLM-5 is Z.AI's fifth-generation flagship 744B MoE open-source model engineered for complex systems design, long-horizon agent workflows, and production-grade coding.

Overview

GLM-5 is Z.AI's fifth-generation flagship foundation model, released on February 11, 2026 under an MIT license. It scales to 744 billion parameters with 40 billion active per token in a Mixture-of-Experts (MoE) architecture, trained on 28.5 trillion tokens—a significant leap from GLM-4.7's 358B parameters and 23T training data. GLM-5 integrates DeepSeek Sparse Attention for higher token efficiency while preserving long-context quality across its 200K-token context window and 128K-token output capacity.

Unlike its predecessor, GLM-5 shifts focus from coding to full systems engineering—complex multi-file project construction, autonomous agent execution, and long-horizon task planning. It delivers production-grade performance on large-scale programming tasks with advanced agentic planning, deep backend reasoning, and iterative self-correction. On SWE-bench Verified, GLM-5 scores 77.8%, approaching Claude Opus 4.5's 80.9% and surpassing Gemini 3 Pro's 76.2%.

GLM-5 is fully open-weight and available on Hugging Face, making it the strongest open-source model at launch. It supports thinking modes, real-time streaming, function calling, context caching, and structured output through an OpenAI-compatible API.

Capabilities

  • 744B MoE architecture with 40B active parameters and efficient expert routing
  • 200K token context with 128K output capacity for end-to-end system generation
  • DeepSeek Sparse Attention for improved token efficiency in long-context tasks
  • Thinking modes including interleaved reasoning and real-time streaming
  • Function calling and structured output for agentic automation workflows
  • Context caching for cost reduction on repeated prompt patterns

Strengths

  • 77.8% on SWE-bench Verified, approaching Claude Opus 4.5 and surpassing Gemini 3 Pro
  • 92.7% on AIME 2026 I and 86.0% on GPQA-Diamond for math and science reasoning
  • 62.0 on BrowseComp and 56.2 on Terminal-Bench 2.0 for agentic task execution
  • 50.4 on Humanity's Last Exam (with tools), competitive with frontier closed-source models
  • Open-source MIT license with weights on Hugging Face for self-hosting
  • Aggressive pricing at $0.80/M input and $2.56/M output tokens via OpenRouter
  • Drop-in OpenAI-compatible API with streaming and function calling

Limitations and Considerations

  • Text-only: no native vision or audio input support
  • MoE architecture (744B total) requires specialized infrastructure for self-hosting
  • Higher cost and latency than GLM-4.7 Flash for simple or lightweight tasks
  • Some benchmark claims are still being independently verified by the community

Best Use Cases

GLM-5 is ideal for:

  • Complex systems engineering and full-stack project construction
  • Agentic coding workflows with Claude Code, Cline, or Roo Code integrations
  • Long-horizon autonomous agent tasks requiring multi-step planning
  • Deep research synthesis and multi-document analysis at 200K context
  • Enterprise applications needing open-source frontier-level performance
  • Cost-conscious teams seeking a Claude Opus 4.5 alternative at lower pricing

Technical Details

Supported Features

chatfunctionsreasoning