Conversation Compaction

Reduce long conversation history by compacting older turns into checkpoints so bots keep context while controlling token usage.

Feature Overview

Conversation compaction helps bots stay coherent in long-running chats without repeatedly sending the full message history to the model. Instead of dropping context abruptly, the platform can summarize older turns into a structured checkpoint and keep newer messages intact.

This approach lowers token pressure while preserving continuity. Teams can keep experiences stable for users even when conversations grow over time, such as support sessions, research workflows, or multi-step assistant tasks.

What You Can Do

Keep conversations responsive as history grows.
Trigger compaction based on conversation thresholds.
Preserve key context by storing a checkpoint message.
Continue the same conversation flow after compaction.
Balance context depth and operating cost with model settings.

How It Works

When compaction is enabled, ChatBotKit evaluates the conversation before completion. If configured thresholds are met, older messages after the latest checkpoint are summarized into a new checkpoint message. The engine then keeps the checkpoint plus recent messages for the next model call.

Threshold strategy is controlled through model options. Use compact when you want summarization-based history control instead of simple truncation.

Use Cases

Customer support assistants that handle long issue-resolution threads.
Internal copilots that help with multi-day planning and task follow-up.
Research assistants that iterate across many prompts and references.
Education flows where lessons and prior explanations must remain coherent.

Getting Started

Open your bot or conversation model settings.
Select a model and open model options.
Set Threshold Strategy to Compact.
Configure Max Tokens and Interaction Max Messages to match your context budget.
Test with long conversations and adjust thresholds for your balance of memory depth and cost.

Best practices:

Start with conservative thresholds and increase gradually.
Keep interaction message limits lower for Q&A assistants.
Validate checkpoint quality with realistic conversation transcripts.

Integration

Compaction works together with conversation management, model options, and streaming responses. You can combine compaction with SDK streaming so users still get responsive output while long history is controlled in the background.

conversations memory optimization tokens

AI Agents

AI Widgets

AI Messaging

AI SDKs

AI Enterprise

AI Whitelabel

Examples

Documentation

Manuals

Tutorials

Changelog

Reflections