Dataset Files
Files can be used to provide a source of records in your datasets. You can create files, attach them to datasets, and sync them to import records.
Create File
Creating a file is the first step to using it as a data source for your datasets. You can create a file by making a POST request to the following endpoint:
Uploading File Content
There are multiple ways to upload file content to be used as a data source for your datasets.
Upload via JSON URL or Data URL
You can upload a file by providing a HTTP URL or a data URL in a JSON request body. This method is suitable for smaller files (up to 4.5MB).
or
Upload via Multipart/Form-Data
You can upload a file using multipart/form-data. This method is suitable for files up to 4.5MB.
Upload via Raw File Stream
You can upload a file by sending the raw file stream in the request body. This method is suitable for files up to 4.5MB.
Direct-to-Source Uploads
For larger files or more control over the upload process, you can obtain a pre-signed upload request by providing the file metadata in a JSON request body. You can then use the provided upload request to upload the file directly to the storage service.
The response will include an uploadRequest object with the necessary
details to perform the upload.
You can then use this uploadRequest to upload the file directly to the
storage service.
Dataset files are the primary way to add content and knowledge to your datasets, enabling AI agents to access and reference specific documents, images, PDFs, text files, and other file types during conversations. Each file attached to a dataset is automatically processed, indexed, and made searchable, allowing the AI to retrieve relevant information when responding to user queries.
Listing Dataset Files
Retrieving the list of files attached to a dataset allows you to inventory all content within a knowledge base, review file metadata, and manage your dataset's content library. The list endpoint provides comprehensive information about each file including its name, description, visibility settings, and timestamps.
To retrieve the files associated with a dataset, send a GET request to the dataset's file list endpoint:
Pagination
The endpoint supports cursor-based pagination for efficiently navigating large file collections:
- cursor: Pagination token from the previous response, enabling you to fetch the next page of results
- take: Number of files to retrieve per page (adjust based on your needs)
- order: Sort order, either
asc(oldest first) ordesc(newest first, default)
Filtering by Metadata
Filter files based on custom metadata fields using deep object notation:
Metadata filtering enables flexible organization and retrieval based on your own categorization schemes, making it easy to find specific types of content within large datasets.
Response Format
The endpoint returns an array of file objects:
File Visibility
Each file has a visibility setting that controls access:
- private: Only accessible to the file owner and explicitly authorized users
- protected: Accessible to users within the same organization or team
- public: Publicly accessible (use with caution for sensitive content)
Streaming Response (JSONL)
For real-time processing of large file lists, request JSONL streaming format:
Each line in the response is a separate JSON object:
This format is ideal for processing large file lists incrementally without waiting for the entire response.
Important Notes:
- Only files attached to datasets you own are returned
- File processing status is not included in the list response; check individual file details for processing state
- Deleted files are automatically removed from the list
- The list reflects the current state of file attachments through the DatasetFileAttachment relationship
- File metadata is flexible and can store arbitrary key-value pairs for custom organization
Attach Dataset File
Add a file to a dataset by creating an attachment between them. Specify the type of attachment.
Detach Dataset File
Remove a file from a dataset by deleting the attachment between them. You can pass an optional parameter to also delete all records associated with the file in the dataset.
Warning: This will permanently delete all records associated with the file in the dataset.
Sync a File to a Dataset
Files are not automatically synced to datasets when they are attached or updated. This is to give you control over when the data is imported and to avoid unnecessary processing.
You can trigger a sync of a file to a dataset by making a POST request to the following endpoint:
The response will contain the ID of the file that was synced. The processing of the file will happen asynchronously, and you can monitor the progress the dataset event log.