Conversation Streaming
Feature Overview
Conversation streaming lets you deliver model output as it is generated instead of waiting for a full response. This creates faster perceived performance and a more natural assistant experience for users.
ChatBotKit streams structured events that can include token and message updates. This makes frontend rendering and backend processing simpler than handling raw stream chunks.
What You Can Do
- Show partial responses in real time as users wait.
- Render typed events in chat UIs without custom parsing logic.
- Build live assistant interactions for support, education, and productivity.
- Combine streaming with conversation state and model controls.
- Handle long responses with better feedback and perceived speed.
How It Works
When you run conversation completion in streaming mode, the SDK returns an async stream of typed events. Your application can read each event and update the interface immediately. Token events improve responsiveness, while message events provide finalized output blocks.
Because events are structured, teams can apply consistent handling across server and frontend layers.
Use Cases
- Real-time support assistants that show progress as answers are generated.
- Coding copilots that stream suggestions while preserving conversation context.
- Internal knowledge assistants that return long answers incrementally.
- Workflow copilots that emit intermediate activity and final messages.
Getting Started
- Install and configure the ChatBotKit SDK.
- Call conversation completion and consume the stream iterator.
- Render token and message events in your UI.
- Keep conversation IDs and message history for follow-up turns.
- Add retry and error handling for network interruptions.
Best practices:
- Render immediately on token events for better perceived latency.
- Use message events for final persistence or analytics.
- Pair streaming with conversation management for multi-turn continuity.
Integration
Conversation streaming pairs well with SDK streaming utilities, conversation management, and model tuning options such as max tokens and interaction message limits.