Datasets
A dataset is a structured collection of data that can be used to provide additional context and information to your AI bot. It is a way for bots to access relevant data and use it to generate responses based on user input. A dataset can include information on a variety of topics, such as product information, customer service queries, or general knowledge.
Bots access datasets as needed during a conversation. A bot can retrieve specific data points or use the data to generate responses based on user input and the data. For example, if a user asks about the price of a product, the bot can use data from a dataset to provide the correct price.
To access a dataset, you must specify the dataset id when starting a conversation with a bot. There is only one dataset allowed per conversation. The number of datasets you can have is determined by your monthly membership or subscription plan. If you need more datasets, you can upgrade your plan or contact customer service for more information.
How to create a Dataset
Follow these instructions to create a new dataset.
- Got to "Datasets" from the navigation bar.
- Click "Create Dataset" button.
- Name your dataset and provide a description.
- Save the dataset by clicking on the "Create" button.
Advanced Options
There are several advanced options you can configure.
Option | Description |
---|---|
Record Max Tokens | The maximum number of tokens to use for new records. This value is only taken into account when importing data from files and integrations. |
Search Min Score | The score to filter search results by. This value depends on the dataset store type. |
Search Max Records | The maximum number of records to return for each dataset search. |
Search Max Tokens | The maximum number of tokens to use for all found dataset record. It is recommended that this value is at least Record Max Tokens tokens in order to fit a single record. |
Separators | A list of separators to use when tokenizing text. The text will be split into chunks starting with the first separator found. Subsequent splits will be made using the next separator found, etc. You can use escape sequences like \n for new line, \t for tab, etc. You should at the very least include the following separators: "\n\n" and "\n". If not specified, the default separators are used. |
Match Instruction | Optional bot instruction to use when a suitable dataset record match is found. |
Mismatch Instruction | Optional bot instruction to use when no suitable dataset records are found. |
Dataset Visibility | Specify if you want to make your Dataset public or keep it private. Public datasets can be found and used by the community. |
Icon | This icon will be used in the dataset list or when displaying the dataset hub. |
Files
Datasets can have attached files, which can provide additional information and context to the chatbot. These files are automatically split into records, ensuring that the dataset stays organized and up to date. Whenever the files change, the corresponding dataset records are kept in sync, ensuring that the chatbot's responses are always based on the most recent information.
The following file types are supported.
File Type | Description |
---|---|
text (.txt ) | Plain text file |
markdown (.md ) | Markdown formatted file |
csv (.csv ) | Comma-separated values file |
JSON (.json ) | JavaScript Object Notation file |
JSONL (.jsonl ) | JSON Lines file |
DOCX (.docx ) DOC (.doc ) | Microsoft Word document file |
PPTX (.pptx ) PPT (.ppt ) | Microsoft Powerpoint document file |
XLSX (.xlsx ) XLS (.xls ) | Microsoft Excel document file |
PDF (.pdf ) | Portable Document Format file |
How to create a Dataset Record
Now you have an empty dataset but you do not have any records. Creating records is also very easy.
- With your dataset selected, click on the "Create Record" button.
- Specify the record text, be aware of the total token count.
- Save the new dataset record by clicking on the "Create" button.
Dataset Record Splitting
If you have more than one paragraph in your dataset record you may wish to split it into multiple records. This is not always necessary, but it can help make your dataset more organized. This is done automatically for you based on your dataset parameters.
If you use URL importing or you wish to enter the record manually, there are some additional options. Simple enter / import the record. Then click the "Create N Records" button. The record will be split into multiple records based on the paragraph breaks you have in the original record.
Dataset Record Autocomplete
We know that populating your Dataset can be hard especially when you do not have readily available data. This is why we have introduced the Record Autocomplete feature. As you type you can press CTRL+Enter or ⌘+Enter (if you are on Mac) to complete the text using the same generative AI models that are powering your chatbot.
Dataset Record Importing
You can import a dataset record from a web page or a document. To do so simply press the "Import" button. Type in the web page address you want to import. To import a document just select it from your file system. Then click the "Import" button.
Summary
In summary, datasets are structured collections of data that can be used to provide additional context and information to a chatbot. Chatbots can use datasets to retrieve specific data points or generate responses based on user input and the data. You can create and customize your own datasets to suit the needs of your chatbot and your users, and you can access them when starting a conversation with a chatbot by specifying the dataset id. There is a limit to the number of datasets you can use, which is determined by your monthly membership or subscription plan.