GLM-4.7
GLM-4.7 is Z.AI's flagship 358B MoE model with 200K context, 128K output, and state-of-the-art performance on coding and reasoning benchmarks.
Overview
GLM-4.7 is Z.AI's flagship foundation model, released on December 22, 2025. Built on a 358-billion parameter Mixture-of-Experts (MoE) architecture, it represents a major leap in coding, reasoning, and agentic capabilities. The model supports a 200,000-token context window paired with an exceptional 128,000-token output capacity—allowing it to generate entire software frameworks, comprehensive reports, or multi-file modules in a single pass.
The architecture activates only specific expert regions based on task requirements, delivering high performance per unit of compute. GLM-4.7 introduces "Interleaved Thinking"—the model reasons before every response and tool call across multi-step turns—combined with "Preserved Thinking" that retains reasoning blocks across conversations to reduce information loss during complex agentic workflows.
A standout feature is "Vibe Coding": enhanced aesthetic intelligence that produces cleaner, more modern user interfaces by default. When generating webpages, slides, or UI components, GLM-4.7 understands visual hierarchy, color harmony, and layout structure better than previous models, significantly reducing frontend polish time.
Capabilities
- 358B MoE architecture with efficient expert routing for cost-effective inference
- 200K token context with massive 128K output capacity for end-to-end generation
- Interleaved Thinking that reasons before each response and tool invocation
- Preserved Thinking to maintain reasoning context across multi-turn sessions
- Vibe Coding for aesthetically superior frontend and UI generation
- Tool streaming with real-time tool call parameter delivery
Strengths
- 73.8% on SWE-bench Verified (+5.8% over GLM-4.6), competitive with Claude Sonnet 4.5
- 42.8% on Humanity's Last Exam (+12.4% improvement), surpassing GPT-5.1
- Open-source SOTA on τ²-Bench for multi-step tool sequencing
- 66.7% on SWE-bench Multilingual for international codebases
- Competitive pricing at $0.60/1M input and $2.20/1M output tokens
- Context caching reduces costs by 20-40% for repeated prompts
- Drop-in OpenAI-compatible API with streaming and function calling
Limitations and Considerations
- Text-only: no native vision or audio (use GLM-4V variants for multimodal)
- MoE architecture requires specialized infrastructure for self-hosting
- Weights not yet fully open-source (expected based on GLM-4.5 precedent)
- Higher latency than Flash variant for simple tasks
Best Use Cases
GLM-4.7 is ideal for:
- Agentic coding with Claude Code, Cline, or Roo Code integrations
- Generating complete software frameworks and multi-file modules
- Frontend development with "Vibe Coding" for polished UI output
- Deep research and multi-step tool orchestration workflows
- Enterprise applications requiring 200K context and 128K output
- Complex reasoning tasks in academic or analytical domains