Sitemap
With ChatBotKit's Sitemap feature, you can easily import a website's information into your dataset by simply providing the website's URL. This feature also automatically summarises long pages using AI, making it easier for you to access the most important information from your chatbot weather they are embedded in your website, slack or discord.
Step-by-step Guide
To integrate ChatBotKit's Sitemap feature into your dataset, follow these simple steps:
- Navigate to "Integrations" in ChatBotKit and click Website Importer.
- Enter a name and optional description for this integration.
- Select the dataset you want to import information into.
- Enter the website URL.
- Save the integration by clicking the "Create" button.
There are several advanced options that needs to be considered. You can find this information under "Advanced Options".
- Glob - the glob is a pattern to focus the integration on specific pages. For example you may want to sync only your documentation found at /docs. In this case the glob needs to be set to
/docs/**
. - Selectors - you can limit the importer to specific areas of your website by providing a list of CSS selectors
- JavaScript - if you turn on this feature the importer will use a full fledged browser to spider the content of your website. This feature is particularly useful to import complex website with a lot of dynamic content and scripts.
- Expires In - you can use this setting to automatically expire old records. This is useful in case you have a very dynamic website with many changes. Using this feature older records will get removed and replaced by newer records guaranteeing better data consistency.
Once the Sitemap integration is created, ChatBotKit will automatically import the information from the website into your selected dataset.
How to Access Imported Information
To access the imported information from the website, simply navigate to the dataset you selected in Step 3. All of the imported information will be available there, including any summarised pages. You can then use this information to train your chatbot or for any other purposes.
Caveats
Please note that there are some limitations to the Sitemap feature. Currently, a crawl is limited to a maximum of 15 minutes and the maximum number of URLs that can be crawled is 1000. If you need to crawl more than 1000 URLs or require a longer crawl time, please contact our customer support team for advice on how to create a custom solution.