GLM-4.7 Flash

GLM-4.7 Flash is Z.AI's lightweight, high-speed coding model with open-source SOTA performance on SWE-bench among comparable-size models.

Overview

GLM-4.7 Flash is a lightweight variant of Z.AI's GLM-4.7 series, optimized for fast inference and low latency while retaining the coding-centric capabilities of the flagship model. Released in January 2026, it achieves open-source SOTA scores on SWE-bench Verified and τ²-Bench among models of comparable size, making it a strong choice for teams that need production-grade code generation without flagship costs.

Flash inherits the GLM-4.7 family's "think before acting" mechanism, preserved reasoning across turns, and per-request thinking control. This allows developers to trade off speed versus accuracy depending on task complexity. The model excels at both frontend and backend development, with notably improved UI generation quality—producing cleaner HTML, CSS, and component layouts compared to earlier versions.

Compared to the flagship GLM-4.7 (which uses a 355B MoE architecture), Flash prioritizes throughput and cost efficiency while still delivering strong benchmark results. It integrates with popular coding tools like Claude Code, Cline, Roo Code, and Cursor, and follows standard OpenAI-compatible API conventions for easy adoption.

Capabilities

  • 200K token context for large codebases, documentation, and multi-file projects
  • Thinking modes including interleaved, preserved, and round-level reasoning control
  • Strong code generation with open-source SOTA on SWE-bench Verified for its size class
  • Tool and function calling with 84.7% on τ²-Bench interactive tool invocation
  • Frontend aesthetics with improved UI/CSS generation quality
  • Multilingual coding including SWE-bench Multilingual improvements

Strengths

  • Open-source SOTA among comparable-size models on coding benchmarks
  • Excellent price-to-performance ratio with aggressive API pricing
  • Preserved thinking mode maintains reasoning context across long sessions
  • Strong frontend generation with cleaner layouts and better accessibility
  • Integrates with Claude Code, Cline, Cursor, and standard API toolchains
  • Supports real-time streaming and structured JSON outputs

Limitations and Considerations

  • Smaller than flagship GLM-4.7, so less capacity for very broad or complex tasks
  • Speed ranking indicates longer response times than ultra-fast mini models
  • For deep analytical or research tasks, larger reasoning models may be better
  • Some advanced features may require specific API configuration

Best Use Cases

GLM-4.7 Flash is ideal for:

  • Agentic coding workflows with Claude Code, Cline, or similar tools
  • Frontend and backend development with quality UI generation
  • Multi-turn coding sessions requiring preserved reasoning context
  • Budget-conscious developer copilots and code assistants
  • CI/CD automation and terminal-based development tasks
  • Teams needing open-source SOTA coding performance at low cost

Technical Details

Supported Features

chatfunctionsreasoning