GLM-4.7

GLM-4.7 is Z.AI's flagship 358B MoE model with 200K context, 128K output, and state-of-the-art performance on coding and reasoning benchmarks.

Overview

GLM-4.7 is Z.AI's flagship foundation model, released on December 22, 2025. Built on a 358-billion parameter Mixture-of-Experts (MoE) architecture, it represents a major leap in coding, reasoning, and agentic capabilities. The model supports a 200,000-token context window paired with an exceptional 128,000-token output capacity—allowing it to generate entire software frameworks, comprehensive reports, or multi-file modules in a single pass.

The architecture activates only specific expert regions based on task requirements, delivering high performance per unit of compute. GLM-4.7 introduces "Interleaved Thinking"—the model reasons before every response and tool call across multi-step turns—combined with "Preserved Thinking" that retains reasoning blocks across conversations to reduce information loss during complex agentic workflows.

A standout feature is "Vibe Coding": enhanced aesthetic intelligence that produces cleaner, more modern user interfaces by default. When generating webpages, slides, or UI components, GLM-4.7 understands visual hierarchy, color harmony, and layout structure better than previous models, significantly reducing frontend polish time.

Capabilities

  • 358B MoE architecture with efficient expert routing for cost-effective inference
  • 200K token context with massive 128K output capacity for end-to-end generation
  • Interleaved Thinking that reasons before each response and tool invocation
  • Preserved Thinking to maintain reasoning context across multi-turn sessions
  • Vibe Coding for aesthetically superior frontend and UI generation
  • Tool streaming with real-time tool call parameter delivery

Strengths

  • 73.8% on SWE-bench Verified (+5.8% over GLM-4.6), competitive with Claude Sonnet 4.5
  • 42.8% on Humanity's Last Exam (+12.4% improvement), surpassing GPT-5.1
  • Open-source SOTA on τ²-Bench for multi-step tool sequencing
  • 66.7% on SWE-bench Multilingual for international codebases
  • Competitive pricing at $0.60/1M input and $2.20/1M output tokens
  • Context caching reduces costs by 20-40% for repeated prompts
  • Drop-in OpenAI-compatible API with streaming and function calling

Limitations and Considerations

  • Text-only: no native vision or audio (use GLM-4V variants for multimodal)
  • MoE architecture requires specialized infrastructure for self-hosting
  • Weights not yet fully open-source (expected based on GLM-4.5 precedent)
  • Higher latency than Flash variant for simple tasks

Best Use Cases

GLM-4.7 is ideal for:

  • Agentic coding with Claude Code, Cline, or Roo Code integrations
  • Generating complete software frameworks and multi-file modules
  • Frontend development with "Vibe Coding" for polished UI output
  • Deep research and multi-step tool orchestration workflows
  • Enterprise applications requiring 200K context and 128K output
  • Complex reasoning tasks in academic or analytical domains

Technical Details

Supported Features

chatfunctionsreasoning