back to tutorials

How to Train Your Own ChatGPT with Your Data

Learn how to train your own ChatGPT with your custom data using ChatBotKit Datasets. Follow step-by-step instructions to create and configure datasets, add files and integrations, and create dataset records. Start building your own conversational AI bot today!

ChatGPT, a term coined by OpenAI, is often used as a catch-all term to describe any large language model (LLM). It's a chatbot interface that enables communication with AI models in a conversational manner. Today, we're going to discuss how you can train your own "ChatGPT" with custom data using ChatBotKit Datasets.

Before we proceed, let's understand what datasets are in the context of AI.

What are Datasets?

A dataset is a structured collection of data that can be used to provide additional context and information to your AI bot. It could include information on a variety of topics, such as product information, customer service queries, or general knowledge. Bots access datasets as needed during a conversation to generate responses based on user input and the data.

Step-by-step Guide to Use ChatBotKit Datasets

1. Creating a Dataset

First, you need to create a new dataset by following these steps:

  1. Go to "Datasets" from the navigation bar.
  2. Click the "Create Dataset" button.
  3. Name your dataset and provide a description.
  4. Save the dataset by clicking on the "Create" button.

2. Configuring Advanced Options

ChatBotKit provides several advanced options you can configure, including:

  • Record Max Tokens: The maximum number of tokens to use for new records.
  • Search Max Records and Tokens: The maximum number of records and tokens to use for each dataset search.
  • Match and Mismatch Instructions: Optional bot instructions to use when a dataset record match is found or not found.
  • Dataset Visibility: Specify if you want to make your dataset public or keep it private.

3. Adding Files to Datasets

Datasets can have attached files, which provide additional information and context. Supported file types include .txt, .md, .csv, .json, .jsonl, .docx, and .pdf. These files are automatically split into records, keeping the dataset organized and up-to-date.

4. Adding Integrations

To automate the population of your datasets, you can take advantage of a variety of integrations. For instance, our Sitemap integration, also known as website importer, allows you to import data directly from your website into your dataset. Our Notion integration, known as Notion importer, enables you to seamlessly import data from your Notion documents. These integrations can significantly simplify the process of maintaining and updating your datasets, thus enhancing the performance and effectiveness of your bot.

5. Creating a Dataset Record

You can also create records manually by following these steps:

  1. With your dataset selected, click on the "Create Record" button.
  2. Specify the record text, being aware of the total token count.
  3. Save the new dataset record by clicking on the "Create" button.

Remember, if your dataset record has more than one paragraph, you may wish to split it into multiple records.

Final Words

By following these steps, you can successfully create and train your own ChatGPT-like bot using your custom data with ChatBotKit Datasets. Remember to experiment, iterate, and improve your datasets and models over time to achieve the best results. Happy training!