Agent Completion APIs: Picking the Right Endpoint for the Job

A guide to the conversation and task completion APIs on ChatBotKit. Covers stateless vs stateful conversations, streaming vs request/response, dispatch and background execution, and the distinction between time-bound dispatches and long-running tasks.

Overview

There are several different ways to ask a ChatBotKit agent to produce a response, and they sit at different points along two axes: how much state the platform keeps for you, and how the response is delivered. The same logical operation - "have the agent answer something" - has a stateless variant and a stateful variant, a streaming variant and a request/response variant, an interactive variant and a background variant.

The result is a small matrix of endpoints that each fit a specific shape of integration. A chat widget on a marketing site has very different requirements from a nightly summarisation job, and both have different requirements from a long-running analysis that needs to chew on a corpus for hours. Picking the right endpoint up front saves you from working around the wrong one later.

This guide walks through the conversation completion endpoints in their five common shapes, then covers the task trigger endpoint that sits alongside them, and explains where each one fits. It also covers the most important operational distinction in the background-execution surface: dispatches are time-bound to 15 minutes, tasks are not.

The Two Axes

Before going through the endpoints individually it helps to be precise about the two axes they sit on, because every endpoint name in this guide is a point in this 2D space.

Stateless vs stateful

A stateless completion does not keep any platform-side record of the exchange. You send the full input - backstory, dataset reference, message history if you want one - and the platform returns a response. There is no conversationId. Nothing is stored after the call returns. If you want history, you maintain it yourself and replay it on the next call.

A stateful completion runs against a conversationId. The platform owns the message history, the session state, the contact link, the per-conversation metadata, and the turn-taking. You send the next user message; the platform appends it to the conversation, runs the agent against the accumulated history, appends the assistant's reply, and returns it.

Stateless is the right shape when your application is the source of truth for the conversation. Stateful is the right shape when the platform should be the source of truth - which is almost always the case for production assistants where you want the Inbox, conversation analytics, audit logs, and contact attribution to work without extra glue.

Streaming vs request/response

A streaming call returns tokens, partial messages, function calls, and the final result as they happen, over a long-lived connection. The user sees output appearing word by word; tool calls surface as they execute; the client can render progress.

A request/response call returns one JSON payload at the end. The whole completion runs server-side, and the client gets a single result. Simpler to consume, but no progress signal until it finishes.

The five conversation endpoints described in the next section all support both modes (with one exception called out below). The choice is independent of the stateless/stateful axis - every endpoint can be used either way.

The Five Conversation Endpoints

The /v1/conversation namespace exposes five different methods, each pinned to a specific point on those axes. They are all "completions" in the broad sense, but the surface area they expose is quite different.

Stateless completion

A one-shot stateless completion. You pass the full input - backstory, dataset, messages array, model parameters - and get back the assistant response. Supports both streaming and request/response.

This is the right endpoint when:

Your application owns the conversation history and you do not want platform-side persistence.
The interaction is genuinely one-off (a "summarise this email" call, a one-shot classification, a one-time tool invocation).
You are integrating the platform as a model provider and the rest of your system already does conversation tracking.

What you give up by using it: nothing accumulates on the platform side. The conversation will not appear in the Inbox, will not be linked to a contact, and will not be visible in any of the conversation analytics or audit surfaces. If you later want to ask "what did this user discuss with the agent last Tuesday?" the answer is in your storage, not ours.

Stateful, append a user message

The first half of a stateful completion. It takes a user message, appends it to the conversation identified by conversationId, and returns. It does not run the agent. Supports streaming and request/response (the streaming variant exists mostly for symmetry with receive).

The reason this is split out as its own endpoint is that there are integration shapes where the act of recording a user message is decoupled from the act of producing an agent reply. A typing indicator, a multi-message burst from a user that should be batched before the agent responds, a moderation pass that runs between message storage and agent execution - all of those are easier to build when "store the message" and "produce a response" are separate operations.

Stateful, run the agent

The second half. It runs the agent against the current conversation state and returns the next assistant message. Supports streaming and request/response.

Together with send, this gives you fine-grained control over the turn cycle. You can store a user message, do something else (run a moderation pass, fan out the message to multiple downstream systems, wait for a human approval), and then ask the agent to reply. You can also call receive without a prior send if you want the agent to speak first - this is how you produce a proactive opening message at the start of a conversation.

Stateful, send and receive in one call

The combined operation. Equivalent to a send immediately followed by a receive, against the same conversationId, in a single call. Supports streaming and request/response.

This is the endpoint most chat UIs end up using, because the typical chat loop is exactly "user said something, please reply" and there is no useful work to do between the two halves. It saves a round-trip and keeps the streaming connection contiguous over the whole turn.

The split between complete and the send + receive pair is purely about whether you need to interpose logic between the user message landing and the agent running. If you do not, use complete. If you do, split it.

Stateful, fire-and-forget background completion

The same logical operation as complete, but the request returns immediately while the completion runs asynchronously on the platform. The response gives you a channelId you can subscribe to over POST /api/v1/channel/{channelId}/subscribe if you want to stream the result back later. If you do not subscribe, the completion still runs to completion and the result lands in the conversation history; the channel is purely for live progress.

There is also a stateless variant at POST /api/v1/conversation/dispatch, which has the same shape but runs against a stateless input the way /conversation/complete does.

Dispatch is the right tool when:

The completion is going to take long enough that holding an HTTP connection open is awkward (multi-step tool use, large dataset retrieval, deep evaluations, anything with a chance of running for tens of seconds or more).
The client may not be able to hold a connection open at all - a mobile app that backgrounds, a webhook handler that has a tight timeout, a serverless function with a hard execution limit.
You want to fire the completion off in response to an event and let it run, with the result simply appearing in the conversation history (and in any downstream surfaces, like the Inbox) once it finishes.

The trade-off is that there is no synchronous return value. If you need the assistant's reply in the same HTTP response, dispatch is the wrong endpoint; use complete instead.

Note that dispatch does not support streaming the response back from the original request - the request returns a channelId and that is it. The streaming happens, separately, over the channel subscription.

Summary table

Endpoint	State	Combines send + receive?	Streaming?	Request/response?	Synchronous result?
`/conversation/complete`	stateless	n/a (full input each time)	yes	yes	yes
`/conversation/{id}/send`	stateful	no - appends user message only	yes	yes	yes
`/conversation/{id}/receive`	stateful	no - generates assistant message only	yes	yes	yes
`/conversation/{id}/complete`	stateful	yes	yes	yes	yes
`/conversation/{id}/dispatch`	stateful	yes (same input as complete)	via channel subscription	request returns `channelId`	no

Decision Tree for Conversation Endpoints

A short version of how to pick:

Does the platform need to keep the conversation history?
- No → /conversation/complete. Done.
- Yes → continue.
Does the caller need the assistant's reply in the same HTTP response?
- No, fire-and-forget is fine → /conversation/{id}/dispatch.
- Yes → continue.
Is there work to do between recording the user's message and running the agent?
- No → /conversation/{id}/complete.
- Yes → /conversation/{id}/send, do the in-between work, then /conversation/{id}/receive.
Is this a proactive opening message with no user input first?
- Yes → /conversation/{id}/receive.

This covers most cases. The places where it gets more interesting are when the work the agent does is part of a larger background pipeline - at which point the question stops being "which conversation endpoint" and starts being "should this even be a conversation completion at all". That is what the next section covers.

Background Jobs: Dispatch vs Task

/conversation/{id}/dispatch is one way to run a completion in the background, but it is not the only background-execution surface on the platform. There is one other, and they exist because "run something asynchronously" actually covers two meaningfully different patterns:

Background completion of a conversation. A specific conversation needs to advance, but you do not want to wait. Use /conversation/dispatch (stateless) or /conversation/{id}/dispatch (stateful).
Run a configured task on demand. A task is a saved, named, schedulable unit of agent work - a configured bot, instructions, contact, session settings - that normally runs on a cron-like schedule. Sometimes you want to run it now. Use /task/{id}/trigger.

Both endpoints are POST requests that return quickly while real work happens in the background, and both produce a conversation as the artefact of that work. The important difference is operational, and is covered in the next section.

The 15-minute rule: why dispatch and task are different

Dispatch is intentionally time-bound. Any background completion initiated through /conversation/dispatch or /conversation/{id}/dispatch has a hard ceiling of 15 minutes of execution time. The same ceiling applies to background completions that come in through chat-style integrations - Slack, Discord, and any other inbound integration that runs the bot in the background in response to an incoming message. If the agent has not produced a final answer within 15 minutes, the run is terminated.

This is deliberate. Background completions on the conversation side are reactive: they answer a message, advance a turn, respond to an inbound event. The user or the calling system is, in some sense, waiting for an answer, even if asynchronously. A reactive background job that disappears into a multi-hour tool-using loop with no external supervision is a problem, so the platform caps it. The cap exists to keep agents from drifting into long-running, uncontrolled work where the failure mode is silent.

Tasks are the other side of that coin. A task can run for a long time - depending on how it is configured, the execution window can extend across hours, days, or months. Tasks are the right primitive when the work genuinely needs that runway: deep research, large-scale data processing, multi-stage agent workflows, anything where 15 minutes is not enough and the work is something you have explicitly modelled as long-running. The task resource is also what you cancel against. Because tasks have an identity and a long-lived execution, they are the only background runs that can be cancelled mid-flight; a dispatch runs to completion (or to its 15-minute ceiling) and is not interruptible.

The short version:

Dispatch - reactive, time-bound at 15 minutes, not cancellable. Same applies to background runs initiated from inbound integrations.
Task - explicit long-running primitive, configurable duration, cancellable.

If your job needs to run for longer than 15 minutes, or you need to be able to stop it once it has started, it must be a task.

There is one nuance worth calling out. An agent running inside a 15-minute-bounded run - whether that run was started by a dispatch, or by an inbound message arriving through an integration like Slack or Discord - can still accept a piece of work that genuinely needs hours or days to complete. It just cannot do that work inside the bounded run itself. Through additional capabilities, an agent can be granted the ability to create and schedule background tasks on its own. Given those capabilities, an agent that receives a long-running request can spend its 15 minutes deciding what needs to happen, decomposing the work, and scheduling one or more tasks to carry it out over a longer horizon. The bounded run finishes when that planning and scheduling is done; the actual execution then happens inside the tasks the agent created, each running under the task execution model with its own configurable duration and cancellation surface.

The 15-minute cap is still a hard line - it bounds the initial turn, the moment of accepting and routing the work, regardless of whether that turn was started by a direct dispatch or by an integration delivering a message. The work itself can extend arbitrarily far past it as long as the agent uses tasks to carry it.

Triggering a task

A task is a first-class platform resource. It bundles together a bot, an optional contact, instructions, session settings, optional metadata, and a schedule. Tasks are how you run agent work on a regular cadence (every hour, every weekday at 9am, on a cron expression) without writing scheduling infrastructure of your own - and they are also how you run agent work that needs more than 15 minutes of execution time, with or without a schedule.

/task/{id}/trigger runs an existing task immediately, outside of its schedule, with all the same configuration the scheduled run would use. The response confirms the task was queued; the actual execution happens in the background and creates a conversation in the platform exactly as a scheduled run would. The next scheduled run is unaffected.

Use this when:

The work needs to run for longer than the 15-minute dispatch ceiling.
The work needs to be cancellable once it has started.
You have configured work that you want to be able to invoke on demand and on a schedule, with a single resource as the source of truth for what it does.
You want the audit trail, the conversation history, and the per-task analytics that come with the task resource.
You want to fire it from a webhook or an internal admin action without re-specifying what it does on every invocation.

The pattern this fits is "saved agent job, run it now". If your operation has a meaningful name, has consistent configuration, might need to run on a schedule one day, or has any chance of needing more than 15 minutes to finish, model it as a task and trigger it.

How dispatch and task trigger compare

Both are background entry points, but they differ in what the durable resource looks like and in their execution semantics.

Aspect	`/conversation/{id}/dispatch`	`/task/{id}/trigger`
What is the configured resource?	A conversation	A task (bot + instructions + schedule)
Where does the configuration live?	In the request body	In the task resource
Normal trigger source	Application code, on demand; inbound integration messages	The task's schedule
Execution time limit	15 minutes, hard cap	Configurable, can run for hours/days/months
Cancellable mid-run?	No	Yes
Result delivery	Conversation history; optional channel for live progress	Conversation history attached to the task
Best for	A specific conversation that needs to advance asynchronously, within 15 minutes	Saved or long-running agent jobs

The way to read this table is that the two endpoints are not interchangeable - they correspond to two different long-lived resource types with two different operational profiles, and the question is which one fits the work you are modelling.

Picking between them

A short heuristic:

The work is a one-off completion against a specific conversation and finishes within 15 minutes. Use /conversation/{id}/dispatch. There is no benefit to creating a task just to run something once and quickly.
The work has a name and consistent configuration, might run on a schedule, might need to be cancelled, or might need more than 15 minutes. Model it as a task. Use the schedule for normal runs and /task/{id}/trigger for on-demand runs.
You are not sure yet. If there is any realistic chance the work runs longer than 15 minutes, or any realistic chance you will want to cancel it, start with a task. Otherwise dispatch is cheaper to retire than a task.

Worked Examples

Three short examples to illustrate the shape of the choice.

A visitor lands on the page, opens a chat widget, types a message, sees the assistant's reply stream in. The conversation should be visible in the Inbox afterwards so the support team can review it.

Stateful: yes, the platform should own the history.
Background: no, the user is staring at the widget and expects the reply now.
Send/receive split: no, every turn is "user said something, agent replies".

Use /conversation/{id}/complete with streaming enabled. On the very first turn, create the conversation; on every subsequent turn, call complete against its ID.

Example 2: a nightly summary email

Every weekday at 7am the assistant looks at the previous day's data and emails a summary to a distribution list.

This has a name. It runs on a schedule. Its configuration (which bot, which instructions, which contact to attribute it to) is consistent across runs.

Model it as a task with a schedule of 0 7 * * 1-5. Let the schedule drive normal runs. Expose a small admin button that calls /task/{id}/trigger for the case where someone needs to regenerate the summary on demand. The endpoint that produces the actual completion is internal to the task workflow; you never call /conversation/dispatch directly for this.

Example 3: a long-running document analysis

An internal tool needs to run an agent against a large document corpus. The work involves multiple tool calls, retrieval against a sizeable dataset, and several rounds of synthesis. Realistic runtime is somewhere between 30 minutes and a few hours, and the operator needs to be able to cancel a run if they realise it was kicked off with the wrong inputs.

Long-running: yes, comfortably past the 15-minute dispatch ceiling.
Cancellable: yes, the operator needs to be able to stop it.

Model it as a task. The task carries the bot, the instructions, the dataset reference, and any schedule (or none, if it only ever runs on demand). The internal tool calls /task/{id}/trigger to start a run and uses the task's cancellation surface to stop one in flight. Dispatch is not an option here - the 15-minute cap would terminate the run before it finishes, and a dispatched run cannot be cancelled in any case.

Common Pitfalls

A few patterns that tend to cause problems.

Using /conversation/complete (stateless) when you actually want the conversation to show up in the Inbox. Stateless completions do not appear in the platform's conversation surfaces. If your support team expects to be able to review what the agent said, you need a stateful conversation, not a stateless one.

Holding a streaming complete connection open for completions that take minutes. Long completions over a streaming HTTP connection are fragile - proxies time out, mobile networks drop, serverless functions hit execution limits. If a completion is reliably going to take more than 20–30 seconds, dispatch it and subscribe to the channel.

Using dispatch for work that might run longer than 15 minutes. The 15-minute cap is a hard ceiling. A dispatch that hits it gets terminated, regardless of how close to finishing it was. If the realistic upper bound on the work is more than 15 minutes, model it as a task. The same applies to bots driven from chat-style integrations: those run as background dispatches and inherit the 15-minute cap, so a Slack or Discord agent that needs to do hours of work needs to hand that work off to a task rather than try to do it inline.

Expecting to cancel a dispatch. Dispatched completions run to completion or to their 15-minute ceiling. There is no cancellation surface for them. If the work needs to be stoppable, it has to be a task.

Modelling everything as a task. Tasks carry the overhead of a saved resource: they need to be created, managed, and cleaned up. A genuinely one-off completion that finishes well inside the 15-minute window does not need a task. Reach for /conversation/dispatch first.

Forgetting that dispatch does not return the result synchronously. Code that calls dispatch and immediately tries to read a reply from the response will get a channelId and nothing else. If you need the reply in-band, use complete, not dispatch.

Summary

The conversation API exposes five completion shapes that cover the combinations of stateless vs stateful and synchronous vs background. Most chat UIs land on /conversation/{id}/complete. Stateless integrations land on /conversation/complete. The send and receive split is for the cases where you need to interpose logic between recording a user message and producing the agent reply. dispatch is the background variant, with the result delivered later through a channel subscription or through the conversation history.

Alongside those, /task/{id}/trigger covers the case where a completion is part of a larger configured, possibly scheduled, possibly long-running resource. The operational difference between the two background entry points is the one to internalise: dispatch is reactive and capped at 15 minutes and cannot be cancelled; tasks have a configurable, much longer execution window and can be cancelled mid-run. The same 15-minute cap applies to background runs initiated from inbound chat integrations.

The right endpoint is the one whose shape matches the work. A short one-off completion is a dispatch or a complete. A named, scheduled, repeatable, or long-running job is a task. Pick the resource type first, and the endpoint follows.

conversations completions streaming dispatch tasks background jobs api