back to tutorials

Multi-Stage Data Extraction with ChatBotKit Complete API

This tutorial demonstrates how to perform multi-stage data extraction using ChatBotKit's complete API, a powerful capability that distinguishes ChatBotKit from native APIs provided by OpenAI and other providers.

Understanding the Difference

Traditional Approach (OpenAI & Others)

Most AI providers, including OpenAI, perform data extraction only at the very end of the conversation flow:

  1. The conversation runs to completion
  2. At the end, structured output is extracted using a single schema
  3. You get one extraction result when everything is done

This approach works for simple scenarios but has limitations:

  • No progressive data capture during multi-step processes
  • Cannot extract intermediate results from sub-agents or parallel tasks
  • Limited visibility into extraction progress

ChatBotKit's Multi-Stage Approach

ChatBotKit enables progressive extraction at multiple stages during conversation processing:

  1. Define inline functions with extraction schemas
  2. Functions are called automatically during conversation flow
  3. Monitor function calls to capture structured data as they happen
  4. Extract data from multiple sources (orchestrator, sub-agents, parallel tasks)
  5. Compile results progressively rather than waiting for completion

This is achieved through:

  • Inline functions with JSON schemas that act as extraction points
  • Pre-canned function results that allow the conversation to continue seamlessly
  • Event streaming that lets you monitor and capture function calls in real-time

Real-World Use Case: Priority Gathering System

Let's build a system that gathers priorities from multiple AI agents and extracts them progressively. This example demonstrates multi-stage extraction in action.

Architecture Overview

Step 1: Define the Extraction Schema

First, define the data structure you want to extract. This becomes the schema for your inline function:

Step 2: Set Up the Conversation with Extraction Function

Create a conversation that includes an inline function for data extraction. The key is providing a pre-canned result so the conversation continues without waiting for your application:

Step 3: Complete Example with Multiple Extraction Points

Here's a complete example showing multiple extraction stages:

Key Advantages of Multi-Stage Extraction

1. Progressive Data Capture

Extract data as the conversation unfolds, not just at the end:

2. Better User Experience

Show extraction progress to users in real-time:

3. Flexible Data Structures

Define different schemas for different extraction stages:

4. Error Recovery

Handle failures at specific stages without losing all data:

Common Use Cases

1. Multi-Agent Systems

Extract results from each agent as they complete their tasks:

  • Sales agent extracts customer info
  • Support agent extracts issue details
  • Product agent extracts feature requests

2. Form Filling

Extract form fields progressively during a conversation:

  • Stage 1: Extract name and email
  • Stage 2: Extract preferences
  • Stage 3: Extract final submission data

3. Data Aggregation

Compile data from multiple sources:

  • Query multiple databases/APIs via agents
  • Extract results from each source
  • Aggregate into final report

4. Quality Assurance

Validate extracted data at each stage:

  • Extract data
  • Validate completeness
  • Request clarification if needed
  • Final extraction with validated data

Troubleshooting

Function Not Being Called

Problem: The AI doesn't call your extraction function.

Solution: Make the function description more explicit:

Missing Data in Extraction

Problem: Some fields are undefined in the extracted data.

Solution: Make fields required and add clear descriptions:

Pre-canned Results Not Working

Problem: Conversation hangs waiting for function result.

Solution: Always provide a result field:

Conclusion

Multi-stage data extraction with ChatBotKit's complete API offers significant advantages over traditional end-of-conversation extraction:

  • Real-time extraction as the conversation progresses
  • Multiple extraction points for complex workflows
  • Better visibility into the extraction process
  • Flexible schemas for different stages

By using inline functions with pre-canned results and monitoring function calls through event streaming, you can build sophisticated data extraction pipelines that capture structured data progressively, enabling more robust and interactive AI applications.