Datasets

Datasets are structured collections of data that serve as knowledge bases for various applications, enabling efficient storage, retrieval, and management of information.

Datasets are essential components for building knowledge-driven AI applications, allowing you to organize, store, and efficiently retrieve information that powers intelligent conversations and automated workflows. A dataset acts as a centralized repository for structured or unstructured data that can be queried, searched, and referenced by bots, agents, and other AI-powered systems.

Creating Datasets

Creating a dataset is the foundational step in building a knowledge base for your AI applications. When you create a dataset, you establish a container that can hold records, files, and structured information that will be searchable and retrievable by your conversational agents and applications.

The dataset creation process allows you to configure various storage and retrieval parameters that determine how your data is indexed, searched, and presented to AI models. Careful consideration of these settings during creation ensures optimal performance and relevance in your application's responses.

To create a new dataset, send a POST request with the basic information and optional configuration parameters:

Key Configuration Options

name: A descriptive identifier for your dataset
description: Detailed explanation of the dataset's purpose and content
store: The underlying storage backend (defaults to platform default)
recordMaxTokens: Maximum tokens per record for optimal chunking
searchMaxRecords: Maximum number of records returned in search results
searchMaxTokens: Maximum total tokens in search results
visibility: Access control (private, protected, or public)
matchInstruction: Instructions for when records match a query
mismatchInstruction: Instructions for when no records match a query

The API returns the newly created dataset's ID upon successful creation:

Important Considerations:

Immutable Settings: The store type cannot be changed after creation, so choose carefully based on your performance and scale requirements
Token Limits: Setting appropriate token limits helps balance context richness with response time and cost
Search Configuration: Fine-tune search parameters based on your use case-more records provide broader context but may introduce noise

Best Practices:

Use descriptive names that clearly indicate the dataset's content
Set recordMaxTokens based on your content granularity (500-2000 tokens is typical)
Consider visibility settings carefully, especially for sensitive data
Link datasets to blueprints for organized project management

Deleting a Dataset

Deleting a dataset permanently removes it from your account along with all its records and associated data. This operation is irreversible and cannot be undone, so it should be used carefully, especially for datasets that contain important information or are actively being used by bots or other applications.

When you delete a dataset, the entire dataset entity is removed, including its name, description, store configuration, and all records it contains. The operation automatically handles cleanup of related resources, including vector embeddings and indexed data stored in the underlying data store.

Before deleting a dataset, consider whether you need to:

Export your data: If you might need the data later, export records first
Update bot configurations: Remove or update any bots that reference this dataset
Check dependencies: Verify that no active applications depend on this dataset

To delete a dataset, send a POST request with the dataset ID:

The request returns the ID of the deleted dataset upon successful completion:

Important Considerations:

Permanent deletion: Deleted datasets cannot be recovered
Record cleanup: All records within the dataset are also deleted
Store cleanup: Vector embeddings and indexed data are removed from the store
Authorization: You can only delete datasets that belong to your account

If you need to temporarily disable a dataset without deleting it, consider removing it from bot configurations or exporting its data for safekeeping before deletion.

Retrieving a Specific Dataset

Fetching detailed information about a specific dataset allows you to access its complete configuration, search parameters, storage settings, and metadata. This is essential for understanding how a dataset is configured, verifying settings before modifications, or displaying dataset information in user interfaces.

The fetch operation returns comprehensive details about the dataset, including all configuration options that were set during creation or subsequent updates. This information can be used to replicate dataset configurations, audit settings, or make informed decisions about dataset usage in your applications.

To retrieve a specific dataset by its ID, send a GET request:

Replace {datasetId} with the actual dataset identifier (e.g., dts_abc123xyz).

Response Details

The response includes the complete dataset configuration:

Key Fields Explained

store: The vector database or storage backend being used
reranker: Optional reranking model for improved search relevance
recordMaxTokens: Maximum token limit per individual record
searchMinScore: Minimum similarity score threshold for search results
searchMaxRecords: Maximum number of records returned in searches
searchMaxTokens: Total token limit across all search results
matchInstruction: System instruction when records are found
mismatchInstruction: System instruction when no records match
visibility: Access control level (private, protected, public)

Common Use Cases

Configuration Auditing: Verify current settings before making updates
Dataset Cloning: Retrieve configuration to replicate in new datasets
UI Display: Show dataset settings in administrative interfaces
Integration Setup: Confirm dataset parameters before connecting to bots
Debugging: Diagnose search behavior by reviewing configuration

Authorization Note: You can only fetch datasets that belong to your account. Attempting to access datasets owned by other users will result in an authorization error.

Updating a Dataset

Modifying an existing dataset allows you to refine its configuration, adjust search parameters, update instructions, and change metadata without affecting the underlying data records. This flexibility enables you to optimize dataset performance and behavior as your application requirements evolve.

Dataset updates are ideal for tuning search relevance, adjusting token limits based on performance observations, refining match/mismatch instructions, or updating organizational metadata. The update operation preserves all existing records while applying new configuration settings that will affect future search and retrieval operations.

To update a dataset, send a POST request with the fields you want to modify:

Replace {datasetId} with your dataset's identifier (e.g., dts_abc123xyz). You only need to include the fields you want to update-unchanged fields will retain their current values.

Updatable Fields

The following properties can be modified after dataset creation:

name: Display name for the dataset
description: Detailed description of contents and purpose
recordMaxTokens: Maximum tokens per record chunk
searchMinScore: Minimum similarity threshold for search results
searchMaxRecords: Maximum number of records returned per search
searchMaxTokens: Total token limit across all search results
matchInstruction: Instructions when records are found
mismatchInstruction: Instructions when no matching records exist
visibility: Access control (private, protected, public)
reranker: Reranking model for improving search relevance
separators: Custom text separators for record chunking
blueprintId: Associated blueprint for organization
meta: Custom metadata for flexible categorization

Immutable Properties

Important: The following properties cannot be changed after creation:

store: The underlying storage backend (e.g., pinecone, postgres)

Attempting to modify the store type will have no effect. If you need to change the storage backend, you must create a new dataset and migrate your data.

Response

Upon successful update, the API returns the dataset ID:

Common Update Scenarios

Tuning Search Relevance: Adjust searchMinScore and searchMaxRecords based on observed result quality. Higher scores increase precision but may reduce recall.

Optimizing Token Usage: Modify recordMaxTokens and searchMaxTokens to balance context richness with API costs and response time.

Refining Instructions: Update matchInstruction and mismatchInstruction to improve how AI models utilize or handle the absence of dataset information.

Changing Visibility: Adjust access control as your dataset's sensitivity or sharing requirements change over time.

Best Practices:

Make incremental changes and test the impact before further adjustments
Update instructions to be specific about how information should be used
Monitor search performance after configuration changes
Keep descriptions current as dataset content evolves
Use metadata updates to maintain organizational clarity

Listing Datasets

Retrieving a comprehensive list of all datasets in your account is essential for managing your knowledge bases, monitoring data organization, and accessing dataset configurations programmatically. The list endpoint provides powerful filtering and pagination capabilities to help you efficiently navigate large collections of datasets.

The listing operation returns detailed information about each dataset, including its configuration, storage settings, search parameters, and metadata. This is particularly useful for building administrative interfaces, implementing dataset selection features in applications, or automating dataset management workflows.

To retrieve a list of your datasets, send a GET request:

The response includes all datasets associated with your account, returned as an array of dataset objects with their complete configuration and metadata.

Pagination and Ordering

For accounts with many datasets, pagination helps manage the response size and improve performance:

Available pagination parameters:

cursor: Pagination token from previous response to fetch the next page
take: Number of datasets to retrieve per request
order: Sort order by creation date ("asc" or "desc", defaults to "desc")

Filtering by Blueprint

To retrieve only datasets associated with a specific blueprint or project:

This is useful when working with organized project structures where datasets are grouped by purpose or workflow.

Filtering by Metadata

Datasets with custom metadata can be filtered using meta queries, enabling sophisticated organizational schemes:

Response Structure

Each dataset in the response includes:

Core identifiers: id, name, description
Storage configuration: store type, reranker settings
Search parameters: recordMaxTokens, searchMinScore, searchMaxRecords, searchMaxTokens
Instructions: matchInstruction, mismatchInstruction
Resource relationships: blueprintId
Access control: visibility setting
Metadata: Custom meta fields
Timestamps: createdAt, updatedAt

Best Practices:

Use pagination for large dataset collections to improve API performance
Apply filters when searching for specific datasets to reduce response size
Leverage metadata filtering for custom organizational structures
Store pagination cursors for efficient navigation through results

dataset

Datasets

Creating Datasets

Key Configuration Options

Deleting a Dataset

Retrieving a Specific Dataset

Response Details

Key Fields Explained

Common Use Cases

Updating a Dataset

Updatable Fields

Immutable Properties

Response

Common Update Scenarios

Listing Datasets

Filtering by Blueprint

Filtering by Metadata

Response Structure

Discord Integration

Event Logs

Event Metrics