Extract
The ChatBotKit platform provides a versatile Data Extraction integration that allows to pull contextually relevant information from conversations based on a predetermined JSON schema. This integration populates the conversation metadata and facilitates more efficient data usage in subsequent steps, such as customer support, transcriptions and data analytics.
This integration empowers AI chatbots to not only interact autonomously with users but also to extract key pieces of information from the conversation. Depending on your Trigger setting, the bot uses the provided JSON schema to extract data after the conversation ends or on every message, consequently enriching the conversation metadata.
How to Use the Data Extraction Integration
- Log in to your ChatBotKit account and navigate to the "Integrations" tab.
- Expand "More Integrations" and select the "Data Extraction" integration.
- Specify a name and optional description for the integration.
- Provide a custom JSON schema that your chatbot will use for data extraction.
Once the integration is set up, your AI chatbot will automatically extract data from conversations according to the specified JSON schema. This data will be used to populate the conversation metadata.
Trigger Setting
The Trigger field controls when extraction runs:
- automatic: Extraction runs automatically after each conversation completes. This is the recommended setting for most use cases.
- never: Extraction does not run automatically. Use this when you want to control extraction manually or trigger it via the API.
Choose automatic for continuous, hands-free data collection and never when you need tighter control over when extraction occurs.
Extracted Items
After extraction runs, all extracted records appear in the Extracted Items section of your integration page. You can review individual records and export the full dataset as a CSV file for further analysis in spreadsheets or other tools.
Example Schema
Consider a scenario where you're running an e-commerce platform that sells various types of electronics. You want your chatbot to extract the customer's name, email, the product they are interested in, and any specific questions or issues they have about the product.
Here is an example of a JSON schema that could be used for this purpose:
This schema instructs the chatbot to extract the customer's name, email, the product they are interested in, and their specific question or issue. Remember, the chatbot's backstory and conversation flow need to be designed in such a way that these pieces of information are naturally collected during the conversation.
Advanced Features
The advanced features section offers enhanced functionality for data handling. Here, you can configure request settings, providing flexibility in how extracted data is processed. You have the option to specify either a simple URL or a more detailed request complete with custom headers. This configuration determines the destination for the extracted data. Once the chatbot has successfully extracted the relevant information from the conversation according to your predefined JSON schema, it will automatically transmit this data to the webhook you've specified in your request configuration. This powerful feature enables seamless integration with your existing systems and workflows, allowing for real-time data processing and analysis.
Numeric Value Metrics Collection
The Extract integration can automatically track and analyze numeric values from your conversations. This feature provides valuable insights into the quantitative data being extracted from customer interactions.
Enabling Metrics Collection
To enable automatic metrics collection for specific numeric fields, add the collect: true property to those fields in your extraction schema:
What Gets Tracked
The system automatically identifies and tracks numeric values from fields marked with collect: true:
- Monetary values: prices, amounts, costs
- Quantities: item counts, measurements, percentages
- Ratings and scores: customer satisfaction ratings, product scores
- Performance metrics: response times, conversion rates
Formatting Chart Values
By default, collected values are rendered as plain numbers on the metrics chart. Add an optional display property to a collected field to control how its values are formatted:
The supported tokens are:
number: plain grouped number (the default). Example:1,234.5.percent: percentage. Intl formatting expects a fraction, so a value of0.45is shown as45%.currency/<CODE>: currency using an ISO 4217 code after the slash, such ascurrency/USD,currency/EUR, orcurrency/GBP. The correct symbol is derived automatically, socurrency/USDrenders1234.5as$1,234.50.
The display property only affects presentation on the chart - it does not change the value that is extracted, tracked, or sent to your webhook. Each series in the chart is formatted independently in the tooltip, while the shared y-axis uses a single format when every collected field agrees and falls back to plain numbers otherwise. Unknown or malformed tokens safely fall back to plain number formatting, so a typo never breaks the chart.
Example
For an e-commerce support chatbot with the schema above:
Extracted Data:
Tracked Metrics:
- Order amount:
299.99(tracked becausecollect: true) - Quantity:
5(tracked becausecollect: true) - Discount percentage:
15.5(tracked becausecollect: true)
The customer name is not tracked as a metric since it doesn't have collect: true in the schema.
Analytics and Insights
The collected metrics enable you to:
- Track trends: Monitor changes in order values, quantities, or ratings over time
- Identify patterns: Discover peak ordering periods or common discount amounts
- Generate reports: Create business intelligence reports from conversation data
- Monitor performance: Track key business metrics directly from customer interactions
Benefits
- Business Intelligence: Turn conversation data into actionable business insights
- Trend Analysis: Identify patterns in customer behavior and preferences
- Performance Monitoring: Track key metrics automatically from customer interactions
- Data-Driven Decisions: Make informed decisions based on conversation analytics
This feature seamlessly integrates with your existing Extract integration workflow and requires no changes to your current setup.
Triggering Extraction on Historic Conversations
The Extract integration provides the ability to retroactively apply extraction to existing conversations. This is useful when you want to extract data from conversations that occurred before the integration was configured, or when you've updated your extraction schema and want to reprocess previous conversations.
Using the Trigger Feature
On your Extract integration page, you'll find a Trigger button in the action bar. This button allows you to:
- Process Recent Conversations: Apply your current extraction schema to the most recent 100 conversations
- Refresh Analytics: Update your metrics charts with newly extracted data
- Test Schema Changes: Validate schema modifications against existing conversation data
How It Works
When you trigger extraction on historic conversations:
- Conversation Selection: The system selects up to 100 of your most recent conversations
- Bot Filtering: If your integration is linked to a specific bot, only conversations from that bot are processed
- Queue Processing: Each conversation is queued for extraction using the same pipeline as real-time processing
- Automatic Updates: The metrics chart automatically refreshes to display newly extracted data
Benefits
- Historical Data Recovery: Extract valuable data from conversations that predate your integration setup
- Schema Testing: Validate new extraction schemas against real conversation data
- Analytics Refresh: Update your metrics and charts after making schema changes
- Data Completeness: Ensure comprehensive data extraction across all your conversations
Usage Notes
- The trigger processes conversations using your current extraction schema configuration
- Each conversation will have its metadata updated with the newly extracted data
- The feature respects your integration's bot filtering settings
- Chart data refreshes automatically within 30 seconds of processing completion
This capability enables you to maintain complete historical data while continuously improving your extraction schemas.
Caveats
While the Data Extraction integration is powerful, it's important to design your JSON schema carefully. Inaccurate or inappropriate schema could lead to incomplete or incorrect data extraction. It's recommended to thoroughly test your JSON schema with various conversation scenarios to ensure it extracts the intended data accurately.
FAQ
Do you retry failed request?
Yes. All failed requests will be retried up to 5 times. Requests attempt are recorded in the integration logs. The delay between each retry is calculated based on the following formula: