{"id":"datasets","name":"Datasets","description":"Overview of what datasets are and how they can be used in chatbot conversations. Learn how to add contextual information to your chatbot.","category":"Resources","tags":["datasets","chatbot","dataset","structured collection","context","information","generate response","user input","production infromation","customer service","general knowledge","retrieve data points","conversational ai"],"index":6,"content":"A dataset is a structured collection of data that can be used to provide additional context and information to your AI bot. It is a way for bots to access relevant data and use it to generate responses based on user input. A dataset can include information on a variety of topics, such as product information, customer service queries, or general knowledge.\n\nBots search datasets automatically during a conversation to find relevant context before generating a response. For example, if a user asks about the price of a product, the bot can retrieve matching records from a dataset and use them to provide an accurate answer.\n\nThe most common way to give a bot access to a dataset is to link the dataset directly from the bot's settings page. Once linked, all conversations with that bot automatically use the dataset. A single dataset can be linked to multiple bots, and a conversation can also reference a dataset independently of the bot. The number of datasets you can create is determined by your subscription plan.\n\n## How to create a Dataset\n\nFollow these instructions to create a new dataset.\n\n1. Go to **\"Datasets\"** from the navigation bar.\n2. Click **\"Create Dataset\"** button.\n3. Name your dataset and provide a description.\n4. Save the dataset by clicking on the **\"Create\"** button.\n\n### Advanced Options\n\nThere are several advanced options you can configure.\n\n| Option                   | Description                                                                                                                                                                                                                                                                                                                                                                                                                  |\n| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| **Store**                | The vector store type used to index and search records. This is set once at creation time and cannot be changed later. Different store types use different embedding models and scoring defaults. For more information, refer to the [Stores](https://chatbotkit.com/docs/stores) documentation.                                                                                                                              |\n| **Reranker**             | An optional reranker applied after initial retrieval to improve result quality by re-ordering candidate records using a more precise relevance model. For more information, refer to the [Rerankers](https://chatbotkit.com/docs/rerankers) documentation.                                                                                                                                                                   |\n| **Record Max Tokens**    | The maximum number of tokens to use for new records. This value is only taken into account when importing data from files and integrations.                                                                                                                                                                                                                                                                                  |\n| **Search Min Score**     | The score to filter search results by. This value depends on the dataset store type.                                                                                                                                                                                                                                                                                                                                         |\n| **Search Max Records**   | The maximum number of records to return for each dataset search.                                                                                                                                                                                                                                                                                                                                                             |\n| **Search Max Tokens**    | The maximum number of tokens to use for all found dataset record. It is recommended that this value is at least **Record Max Tokens** tokens in order to fit a single record.                                                                                                                                                                                                                                                |\n| Separators               | A list of separators to use when tokenizing text. The text will be split into chunks starting with the first separator found. Subsequent splits will be made using the next separator found, etc. You can use escape sequences like **`\\n`** for new line, **`\\t`** for tab, etc. You should at the very least include the following separators: **\"\\n\\n\"** and **\"\\n\"**. If not specified, the default separators are used. |\n| **Match Instruction**    | Optional bot instruction to use when a suitable dataset record match is found.                                                                                                                                                                                                                                                                                                                                               |\n| **Mismatch Instruction** | Optional bot instruction to use when no suitable dataset records are found.                                                                                                                                                                                                                                                                                                                                                  |\n| **Dataset Visibility**   | Specify if you want to make your Dataset public or keep it private. Public datasets can be found and used by the community.                                                                                                                                                                                                                                                                                                  |\n| Icon                     | This icon will be used in the dataset list or when displaying the [dataset hub](https://chatbotkit.com/hub/datasets).                                                                                                                                                                                                                                                                                                        |\n\n### Files\n\nDatasets can have attached files, which can provide additional information and context to the chatbot. These files are automatically split into records, ensuring that the dataset stays organized and up to date. Whenever the files change, the corresponding dataset records are kept in sync, ensuring that the chatbot's responses are always based on the most recent information.\n\nThe following file types are supported.\n\n| File Type                   | Description                        |\n| --------------------------- | ---------------------------------- |\n| text (`.txt`)               | Plain text file                    |\n| markdown (`.md`)            | Markdown formatted file            |\n| csv (`.csv`)                | Comma-separated values file        |\n| JSON (`.json`)              | JavaScript Object Notation file    |\n| JSONL (`.jsonl`)            | JSON Lines file                    |\n| DOCX (`.docx`) DOC (`.doc`) | Microsoft Word document file       |\n| PPTX (`.pptx`) PPT (`.ppt`) | Microsoft Powerpoint document file |\n| XLSX (`.xlsx`) XLS (`.xls`) | Microsoft Excel document file      |\n| PDF (`.pdf`)                | Portable Document Format file      |\n\n## How to create a Dataset Record\n\nNow you have an empty dataset but you do not have any records. Creating records is also very easy.\n\n1. With your dataset selected, click on the **\"Create Record\"** button.\n2. Specify the record text, be aware of the total token count.\n3. Save the new dataset record by clicking on the **\"Create\"** button.\n\n### Dataset Record Splitting\n\nIf you have more than one paragraph in your dataset record you may wish to split it into multiple records. This is not always necessary, but it can help make your dataset more organized. This is done automatically for you based on your dataset parameters.\n\nIf you use URL importing or you wish to enter the record manually, there are some additional options. Simple enter / import the record. Then click the **\"Create N Records\"** button. The record will be split into multiple records based on the paragraph breaks you have in the original record.\n\n### Dataset Record Autocomplete\n\nWe know that populating your Dataset can be hard especially when you do not have readily available data. This is why we have introduced the Record Autocomplete feature. As you type you can press CTRL+Enter or ⌘+Enter (if you are on Mac) to complete the text using the same generative AI models that are powering your chatbot.\n\n### Dataset Record Importing\n\nYou can import a dataset record from a web page or a document. To do so simply press the **\"Import\"** button. Type in the web page address you want to import. To import a document just select it from your file system. Then click the **\"Import\"** button.\n\n![](/api/asset/video/052dc948-8c4c-419f-b8eb-d4e31fd2983a)\n\n## Summary\n\nDatasets are structured collections of data that give your bots access to custom knowledge during conversations. You can create datasets from text records or by attaching files, then link a dataset to a bot from the bot's settings page. Once linked, the bot automatically searches the dataset when generating responses. Configuration options like the store type, reranker, and search tuning parameters let you control how records are indexed and retrieved to best fit your use case.","link":"https://chatbotkit.com/docs/datasets","createdAt":1778457600000,"updatedAt":1778457600000}