Multi-Stage Data Extraction with ChatBotKit Complete API
This tutorial demonstrates how to perform multi-stage data extraction using ChatBotKit's complete API, a powerful capability that distinguishes ChatBotKit from native APIs provided by OpenAI and other providers.
Understanding the Difference
Traditional Approach (OpenAI & Others)
Most AI providers, including OpenAI, perform data extraction only at the very end of the conversation flow:
- The conversation runs to completion
- At the end, structured output is extracted using a single schema
- You get one extraction result when everything is done
This approach works for simple scenarios but has limitations:
- No progressive data capture during multi-step processes
- Cannot extract intermediate results from sub-agents or parallel tasks
- Limited visibility into extraction progress
ChatBotKit's Multi-Stage Approach
ChatBotKit enables progressive extraction at multiple stages during conversation processing:
- Define inline functions with extraction schemas
- Functions are called automatically during conversation flow
- Monitor function calls to capture structured data as they happen
- Extract data from multiple sources (orchestrator, sub-agents, parallel tasks)
- Compile results progressively rather than waiting for completion
This is achieved through:
- Inline functions with JSON schemas that act as extraction points
- Pre-canned function results that allow the conversation to continue seamlessly
- Event streaming that lets you monitor and capture function calls in real-time
Real-World Use Case: Priority Gathering System
Let's build a system that gathers priorities from multiple AI agents and extracts them progressively. This example demonstrates multi-stage extraction in action.
Architecture Overview
Step 1: Define the Extraction Schema
First, define the data structure you want to extract. This becomes the schema for your inline function:
Step 2: Set Up the Conversation with Extraction Function
Create a conversation that includes an inline function for data extraction. The key is providing a pre-canned result so the conversation continues without waiting for your application:
Step 3: Complete Example with Multiple Extraction Points
Here's a complete example showing multiple extraction stages:
Key Advantages of Multi-Stage Extraction
1. Progressive Data Capture
Extract data as the conversation unfolds, not just at the end:
2. Better User Experience
Show extraction progress to users in real-time:
3. Flexible Data Structures
Define different schemas for different extraction stages:
4. Error Recovery
Handle failures at specific stages without losing all data:
Common Use Cases
1. Multi-Agent Systems
Extract results from each agent as they complete their tasks:
- Sales agent extracts customer info
- Support agent extracts issue details
- Product agent extracts feature requests
2. Form Filling
Extract form fields progressively during a conversation:
- Stage 1: Extract name and email
- Stage 2: Extract preferences
- Stage 3: Extract final submission data
3. Data Aggregation
Compile data from multiple sources:
- Query multiple databases/APIs via agents
- Extract results from each source
- Aggregate into final report
4. Quality Assurance
Validate extracted data at each stage:
- Extract data
- Validate completeness
- Request clarification if needed
- Final extraction with validated data
Troubleshooting
Function Not Being Called
Problem: The AI doesn't call your extraction function.
Solution: Make the function description more explicit:
Missing Data in Extraction
Problem: Some fields are undefined in the extracted data.
Solution: Make fields required and add clear descriptions:
Pre-canned Results Not Working
Problem: Conversation hangs waiting for function result.
Solution: Always provide a result field:
Conclusion
Multi-stage data extraction with ChatBotKit's complete API offers significant advantages over traditional end-of-conversation extraction:
- Real-time extraction as the conversation progresses
- Multiple extraction points for complex workflows
- Better visibility into the extraction process
- Flexible schemas for different stages
By using inline functions with pre-canned results and monitoring function calls through event streaming, you can build sophisticated data extraction pipelines that capture structured data progressively, enabling more robust and interactive AI applications.