back to guides

Agent Deployments: Environments, Accounts, and Production Pipelines

A comprehensive guide to deploying ChatBotKit agents across environments. Covers master and sub-account architectures, team-based access control, and the trade-offs between the Blueprint Designer, CLI imperative commands and solutions, SDKs, and Terraform deployment paths.

Overview

Deploying an AI agent to production is a different problem than building one. A working prototype proves that an idea is feasible. A production deployment has to answer harder questions: which environments exist, who has access to each, how changes flow from one environment to the next, how secrets are scoped, and how the whole thing can be rebuilt from a known-good source of truth if something goes wrong.

ChatBotKit gives you a small number of building blocks that, combined, cover the full range of deployment topologies most teams need. The two foundational concepts are accounts (master accounts and sub-accounts, also referred to as partner accounts) and teams. The deployment surface on top of those is split between a visual path (the Blueprint Designer) and code-based paths (the CLI, the SDK, and the Terraform provider).

This guide covers how to combine those primitives into deployment architectures that match well-understood patterns from cloud infrastructure, how to choose between deployment paths, and where the trade-offs are. The aim is to leave you with a clear mental model so that the deployment shape you pick is a deliberate choice rather than an artifact of how you happened to start.

Key Concepts

Before getting into deployment patterns it is worth being precise about the primitives, because the rest of the guide depends on them.

Master accounts and sub-accounts

A master account is the top-level account on the platform. It owns billing, can hold its own API credentials, and is the unit a developer or organisation typically signs up for.

A sub-account (also called a partner account) is an account created underneath a master account through the Partner API. Sub-accounts have their own resources (bots, datasets, skillsets, integrations, conversations, secrets) and their own isolation boundary. Sub-accounts can also have their own API tokens, so they can be operated directly without going through the master account if you choose. The master account can additionally create, list, fetch, update, and delete its sub-accounts, and can act on behalf of any of them using a "run-as" mechanism.

By default, resources in one sub-account are not visible to another sub-account. Individual resources can have their visibility changed from private to protected, which makes them shareable across accounts. This is useful when, for example, a shared dataset or skillset should be reused across multiple sub-accounts without being duplicated. The default of private is the safe baseline; protected is an explicit, per-resource opt-in.

The most important property of sub-accounts is that they only nest one level. A master account can hold many sub-accounts, but a sub-account cannot itself hold further sub-accounts. If you need deeper hierarchies, you need multiple master accounts.

Teams

A team is the access-control surface around an account. Teams determine which human users can sign in and operate against a given account. A user can belong to multiple teams, and a team grants access to the account it is attached to.

Teams are how you keep production out of reach of people who should not have it. A production account with a team of one or two named operators is very different from a development account with the whole engineering organisation on it, even if the resources inside look identical.

Resources

The deployable units inside any account are roughly the same set: bots, datasets, skillsets, abilities, integrations (Slack, Discord, WhatsApp, MCP servers, triggers, sitemaps, and so on), files, secrets, blueprints, and portals. These are the things that get created, configured, and wired together to form an agent solution.

For the purposes of this guide the exact resource list matters less than the fact that the resources live inside an account. Wherever the account boundary is drawn, that is also the boundary for everything inside it.

IDs and aliases

Every resource on the platform has a globally unique ID, assigned by the platform at creation. IDs are stable, opaque, and unique across the entire platform - they are how resources are referenced in the API, in audit logs, and in Terraform state. This is by design: it ensures every resource can be referred to without ambiguity from anywhere.

Resources can also have an optional alias. An alias is a human-meaningful name that you choose, settable at creation time and updatable later. The constraint on aliases is straightforward: within the same account, an alias must be unique within a given kind of resource. You can have one bot aliased support and one aliased code, but you cannot have two bots aliased support - the platform will reject the second one with a conflict error.

The reason aliases matter for deployments is that they give you a stable, predictable identifier that survives across environments. The bot in dev and the bot in prod will have different IDs, but both can be aliased support-bot. Code that references the bot by alias works unchanged across environments, and your deployment scripts can look up "the support bot in this account" without needing to remember an environment-specific ID.

This makes aliases the natural mechanism for:

  • Cross-environment promotion. A Terraform module or SDK script that needs to reference an existing resource can do so by alias, and the same code works against any environment that follows the convention.
  • Idempotent deployments. A script that creates a resource with a known alias on first run, and updates the resource of that alias on subsequent runs, is naturally idempotent without needing to track IDs externally.
  • Per-sub-account conventions. If every tenant sub-account is expected to have a support-bot, an inbox-channel, and a knowledge-base dataset, those aliases form a contract that consumers (other agents, application code, support tooling) can rely on regardless of which sub-account they are talking to.

A reasonable rule of thumb: any resource that is referenced by other code or other resources should have an alias. Resources that are purely internal (an ability inside a skillset that nothing else looks up by name) can do without one.

Portals

One particular resource worth flagging early is the portal. Portals are how non-operator users (QA reviewers, content authors, product managers, external partners) get access to specific applications inside an account without being added to the account's team. They are covered in detail in the Teams and Access Control section.

Now that the primitives are in place, we can look at how they combine into deployment topologies.

Authentication

Programmatic access to the platform is gated by API tokens. Every account, master and sub-account alike, can issue its own tokens, and that is the primary mechanism for talking to the API. A token is scoped to the account that issued it, and a process holding that token operates within that account's boundary.

In most deployments, the simplest and most direct path is to issue a token per account and use it directly. A staging sub-account has its own token; a production master account has its own token; whichever process needs to act against a given account uses the token that account issued. There is no run-as indirection involved, and no master credential lying around with broader privileges than the task requires.

For cases where a master account needs to operate against its sub-accounts without each sub-account managing its own tokens, there are two further mechanisms.

The first is temporary tokens. A master account can mint short-lived tokens scoped to a specific sub-account through the Partner API. This is the right shape when you want to hand a credential to a process (a CI job, a one-off script, an end-user session) and have it expire automatically.

The second is the run-as capability. Authenticated as the master account, you set a run-as user identifier on the request and the platform executes the call in the context of the chosen sub-account. No new token is issued; the master credential is simply scoped per call. In the Node SDK, Go SDK, and CLI this surfaces as the CHATBOTKIT_API_SECRET and CHATBOTKIT_API_RUNAS_USERID environment variables, and as the runAsUserId (Node) or RunAsUserID (Go) constructor option on the client. This is convenient for tooling that needs to operate across many sub-accounts in a single process (for example, a deployment script that targets each tenant in turn) without juggling a token per target.

The three mechanisms are not exclusive. A typical deployment uses per-account tokens for routine access, temporary tokens for short-lived processes, and the run-as capability for cross-account tooling driven from the master account.

Two Architectural Patterns

There are two foundational ways to structure environments on top of these primitives. Both are valid. They differ in where the isolation boundary sits and what kind of nesting is possible afterwards.

Pattern A: One master account per environment

In this pattern, each environment (production, staging, development, preview, and so on) is a separate master account. Each has its own billing, its own API secret, its own teams, and its own sub-account namespace if you choose to use one.

The blast radius of a mistake in one environment is contained at the account boundary, billing is separable per environment, and credentials cannot accidentally leak across environments because they are issued by different accounts.

The trade-off is that you operate more accounts. Each one has its own onboarding, its own billing relationship, and its own credential set to manage. There is also no single inventory pane across all environments because a master account cannot see the resources of a peer master account.

Pattern B: One master account with sub-accounts as environments

In this pattern, there is a single master account, and each environment is a sub-account within it. Production, staging, and development are all sub-accounts that the master account can create, list, and act on.

This is the pattern that most closely matches how the major cloud providers structure their tenancy. In Google Cloud, an organisation owns many projects, and the project is where resources live; the master-account-with-sub-accounts shape is essentially the same model. AWS is structurally more complex, but with AWS Organizations the picture lines up too: a management account at the top, with member accounts brought into a single organisational hierarchy underneath. If your team's intuition is shaped by either of those clouds, this pattern is the one that maps most directly.

It is also operationally lighter. There is one master credential to manage when you want it. The platform's partner APIs make it easy to programmatically create new environments. Cross-environment tooling (such as a deployment script that copies a configuration from staging into production) is a matter of changing the run-as header on a single client, or swapping the per-sub-account token if you prefer that route.

The constraint is that sub-accounts do not nest. If your environments themselves need sub-environments (for example, a multi-tenant production environment where each tenant should have its own isolated sub-account), there is nowhere to put the inner level. The single master account model collapses the hierarchy you would have wanted.

Pattern C: Hybrid

The two patterns combine cleanly, and most non-trivial deployments end up here. A common shape is:

  • One master account per environment, for the reasons in Pattern A.
  • Inside each master account, sub-accounts for whatever the next level of isolation is. In a SaaS product that is usually one sub-account per customer or per workspace. In an internal platform it might be one sub-account per business unit or per use case.

This shape gives you environment isolation at the master-account level and tenant isolation at the sub-account level, while keeping each layer scoped to a single concern.

Choosing between them

A short version of the decision:

You need...Pick
Strict billing separation per environmentPattern A or hybrid
Programmatic creation of many isolated workspacesPattern B or hybrid
Two levels of isolation (environment plus tenant)Hybrid
Minimum operational footprintPattern B
The closest match to a typical AWS/GCP organisation layoutPattern B

The decision is reversible in principle (resources can be re-created in a different account topology if your deployments are code-driven) and irreversible in practice (production data, conversation histories, and integration installs are painful to migrate). Choose deliberately at the start.

That said, you do not have to get this perfect on day one. If you start out on one topology and later discover that a different shape would have served you better, the ChatBotKit team can help with migrations between account structures, including moving resources, conversation histories, and integration state across accounts. The earlier you flag the need, the smoother the migration tends to be, but it is genuinely a supported path. Reach out and we will work through the options with you.

Mapping Environments to Accounts

Whatever pattern you pick, the environments inside it tend to look the same.

Production

Production is the environment that real users hit. The team attached to it should be small and named. Ideally a single operator, plus whichever automation identity is responsible for deployments. No experimental work happens here. Everything that lands has come through code review and a staging environment.

Staging

Staging is where deployments are validated before they reach production. It should be configured as close to production as possible: same models, same integration types, comparable secrets (with non-production credentials), comparable data shapes. The team here is typically broader than production but still scoped to engineers and product owners.

Development

Development is where engineers build. In Pattern B this is one sub-account; in the hybrid it is often one master account with one sub-account per developer, so that engineers can experiment without colliding. The Blueprint Designer is most useful here.

Preview / per-branch / per-PR

If your delivery pipeline creates ephemeral environments per change, sub-accounts are the natural unit. The master account creates a sub-account when a branch opens, deploys the configuration into it through code, and tears it down when the branch is merged or closed. This is one of the strongest reasons to adopt Pattern B or hybrid.

Teams and Access Control

Teams are how human access is enforced. They apply at every level of the hierarchy: a master account has its own team, and so does each sub-account. Membership of the master account's team is particularly significant, because it confers the ability to operate against any sub-account underneath. For that reason the master account team should be kept small, ideally just a handful of senior operators. Wider engineering access belongs on the sub-account teams, where the blast radius is contained to a single environment or tenant.

The shape that works for most organisations is:

  • Production team: smallest possible. Operators and on-call engineers only. Read access for a slightly broader audience if your audit needs require it.
  • Staging team: engineering, product, and QA. People who need to validate behaviour but should not be touching production directly.
  • Development team: open to anyone who needs to build or experiment.
  • Tenant or workspace teams (in a hybrid layout): scoped to the people responsible for that tenant. The master account's owner is implicitly able to act on any sub-account, so the team boundary on each sub-account is about restricting who can sign in and use the dashboard for that sub-account, rather than about the master operator.

Teams and accounts together form the practical access policy. An engineer who is only on the development team simply cannot select the production account in the dashboard. There is no path through the UI to production for them, regardless of how the production resources are configured.

Portals: application-level access for the wider organisation

Teams are the right tool when the people involved are operators who legitimately need to use the dashboard, build agents, or manage account configuration. They are too coarse for everyone else. A QA reviewer, a customer-success lead, a content moderator, or a product manager rarely needs the full production dashboard; they need a specific application surface, scoped to a specific job.

This is what portals are for. A portal is an application-level access surface that exposes one or more first-party applications (Inbox for conversation review, Studio for content authoring, and so on) to a defined set of users, with its own authentication, authorisation, and groups. Portal users do not get an API token. They do not see the wider account. They see exactly the application or applications the portal exposes, scoped to whatever the portal allows them to act on.

A few examples of where this fits in a deployment:

  • A QA team that needs to review production conversations gets a portal exposing the Inbox application against the production account. They can read, label, and triage conversations, and nothing else.
  • A content team that authors knowledge base entries gets a portal exposing dataset management against a specific dataset, with no access to the bots or integrations consuming that dataset.
  • An external customer or partner gets a portal exposing only the applications relevant to their workflow, sandboxed away from any unrelated resources.

Portals invert the usual access question. A team grants access to an account, with the assumption that the user belongs in that account's operational surface. A portal grants access to an application, with no exposure of the underlying API or unrelated resources. For wider organisational access, portals are almost always the better fit: they keep the operator population small (which is what the master account team and sub-account teams are sized for), while still letting everyone who needs visibility get exactly the slice of the system they need to do their job.

The Deployment Surface

With the account topology decided, the next question is how configuration actually lands inside an account. There are several practical paths, and they cover different points on the spectrum from "exploratory" to "fully reproducible." Importantly, the platform does not pick one for you - the same set of resources can be deployed from a visual canvas, from a CLI with a state file in JSON, from a hand-written SDK script, from a Terraform module, or from any combination of these. The right answer is usually a layered combination, not a single tool.

The Blueprint Designer (visual)

The Blueprint Designer is a drag-and-drop canvas for assembling agent solutions. You wire together datasets, skillsets, abilities, bots, and integrations visually, run the resulting solution on the spot, and iterate against live behaviour.

This is the right starting point. It compresses the loop between an idea and a working agent down to minutes, surfaces the platform's primitives in a way that is easy to learn, and avoids any need for tooling setup. For an early-stage solution where the shape of the agent is still being figured out, the Blueprint Designer is the fastest path to something useful.

For production it is the wrong tool as the source of truth. Visual configuration is hard to review (there is no diff), hard to reproduce across environments (the configuration is intrinsic to the account it was built in), and hard to audit (changes are not naturally tied to a commit or a ticket). The recommendation is to keep an account that is explicitly for experimental and design work, prove out the agent there, and move to a code-based deployment path before that agent serves production traffic.

What makes the visual path practical for serious work is that blueprints have first-class export to code. Two mechanisms in particular bridge the gap between the visual designer and a code-driven pipeline:

  • Copy-paste of resources. Inside the Blueprint Designer, individual resources (or the whole canvas) can be selected and copied to the clipboard with the standard Ctrl+C / Cmd+C shortcut. The clipboard payload is plain JSON or YAML, which can be pasted directly into a text editor, committed to a repository, edited, and pasted back into the designer with Ctrl+V / Cmd+V. This is the lowest-friction way to capture a working visual design as code, share it with another engineer, or move it between accounts.
  • Blueprint export endpoint. Every blueprint exposes a resource export endpoint at /v1/blueprint/{blueprintId}/resource/export. By default it returns the resources as JSON, but the response format can be selected by the request's accept header: application/terraform+hcl produces ready-to-use Terraform HCL, application/yaml (or equivalents) produces YAML, and the default JSON path is the most direct match for an SDK-driven deployment. This is the supported way to take a working visual design and emit it as the canonical input for whichever code-based path you have chosen.

In other words, you do not have to commit to a deployment path before you start prototyping. The visual designer is the right tool for the early-stage shape-finding phase, and once the agent is solid, export it (or copy-paste it) into JSON, YAML, or HCL and continue from there.

A reasonable rule of thumb: if you would be unable to recreate the agent from scratch in a fresh sub-account in under an hour, you are past the point where a visual deployment alone is appropriate - but this is exactly the point at which the export endpoint becomes useful.

The CLI

The ChatBotKit CLI (@chatbotkit/cli) wraps the platform API in a set of commands suitable for scripting, for ad-hoc operational work, and - through its solution mechanism - for full declarative state-tracked deployments. It loads its credentials from environment files in this order: .env.local, then .env in the current directory, then ~/.cbk/env as a global fallback.

The two environment variables that matter most for multi-account work are:

With these set, every command runs against the chosen sub-account. Switching environments is a matter of switching the env file, which is the operational ergonomic the CLI is designed around.

The CLI has two distinct usage modes, and it is worth understanding them separately because they sit at different points on the deployment spectrum.

Imperative commands (ad-hoc operations)

The first mode is direct API commands: cbk bot create, cbk dataset list, cbk partner user delete, and so on. These are the right tool for bootstrapping a new sub-account, running quick scripts, performing one-off operational fixes, and any task where you know exactly what you want to do and you want to do it once.

Used this way, the CLI does not track desired state. The shell history is the only record of what happened. This is fine for genuinely one-off work; less appropriate when the goal is to keep an environment in a known configuration over time.

Solutions (declarative state-tracked deployments)

The second mode is the cbk solution family of commands, which gives the CLI the declarative shape that Terraform offers. A solution is a named bundle of resources whose desired state is described in JSON files inside a .chatbotkit/ directory in your project. Each resource has a JSON file; the solution as a whole has a state file that records what has been deployed and what the platform-side IDs are.

The high-level workflow:

Two properties of solutions are worth highlighting:

  • State is captured in JSON files that live alongside your code. Unlike Terraform's HCL-plus-binary-state split, both the desired state and the recorded state are JSON. They can be committed to git, diffed in pull requests, and merged using the same tools you use for any other code review.
  • Solution state is mergeable. Because the state files are structured JSON, conflicts in concurrent edits resolve like ordinary JSON merges. Two engineers adding different resources to the same solution will not produce a conflict the way two Terraform apply runs against the same state would.

This makes solutions a good fit when:

  • Your team is already JavaScript- or TypeScript-heavy and would rather not introduce Terraform as a separate tool.
  • You want a code-driven deployment but prefer JSON over HCL.
  • You want the visual designer's exported JSON (from the blueprint export endpoint or copy-paste) to drop directly into a code-based deployment without translation.

It is also a clean partner to the Blueprint Designer specifically: the JSON exported from a blueprint can be saved as a solution resource directly, so the designer becomes the editing surface for individual resources while the solution provides the state and apply machinery.

Solutions sit in the same conceptual slot as Terraform - declarative, state-tracked, idempotent, reviewable - and the choice between the two is mostly a question of preferred ecosystem (CLI/JSON vs. Terraform/HCL).

The SDKs (Node and Go)

The platform offers two first-party SDKs: the Node SDK (@chatbotkit/sdk) and the Go SDK (github.com/chatbotkit/go-sdk). Both expose the full surface of the platform API, both support sub-account operations through partner-aware clients, and both accept a run-as identifier so a single master credential can deploy into any sub-account. The choice between them is mostly a question of which language your deployment tooling already lives in.

The SDKs are the right path when your deployment logic is not a flat list of resources but a procedure: read existing state, decide what to do, create or update accordingly, with conditional logic. They are also the natural fit for deployment work that runs inside an existing application: a SaaS backend that provisions a sub-account when a new customer signs up, a long-running operator that reconciles agent state on a schedule, or a worker that fans configuration out across many tenants.

Node SDK example

Go SDK example

The Go SDK follows the same shape: a single client constructor, partner clients for sub-account management, and a RunAsUserID field for scoping calls into a sub-account.

When to pick which

Functionally the two SDKs are equivalent for deployment use cases. Pick on environment fit:

  • Node SDK is the natural choice when your deployment scripts and your application code are already JavaScript or TypeScript, when you want to share types and helpers between a Next.js frontend and a deployment script, or when you are leaning on the broader @chatbotkit/* ecosystem (CLI, widget, Next.js helpers).
  • Go SDK is the natural choice when your platform tooling, controllers, or operators are written in Go, when you want a single statically-compiled binary you can ship into a container or distribute to operators, or when you are integrating with Kubernetes-style controllers, Terraform plugins, or other Go-based infrastructure.

You can mix both within an organisation; nothing on the platform side cares which language the calling process is written in.

Trade-offs to plan for

This is the path for deployment systems that need to express logic the configuration languages cannot: looking up resources by name, branching on environment, generating skillset abilities from a source schema, or driving a per-tenant fan-out where each tenant gets a customised version of a base configuration.

The trade-off is that the SDKs give you state management as a problem to solve. Re-running the same script can either be a no-op or duplicate everything, depending on how you wrote it. If you go this route, build idempotency in from the start: store the IDs of resources you create, look them up by stable identifier on subsequent runs, and update rather than re-create.

The Terraform provider

The Terraform provider (chatbotkit/chatbotkit) treats agent resources as infrastructure. You declare the desired state in .tf files; Terraform diffs against the current state and applies changes. Configuration drift is detected on the next plan, deletion is explicit, and the whole deployment is review-able as a diff in version control.

A complete example for a single account looks like this:

For multi-environment deployments the typical Terraform shape is one workspace per environment, each pointed at a different account credential. Terraform Cloud, Terraform Workspaces, or a directory-per-environment layout all work; the platform does not constrain you here.

Terraform is the right path when:

  • You want a single, declarative source of truth for what an account contains.
  • Your team already operates Terraform for cloud infrastructure and wants the same review and apply discipline for agents.
  • You need drift detection and reproducible recreation of an environment.

It is less suited to deployments that are highly procedural (where a custom SDK script is more natural) or that need to inspect runtime state on the platform side before deciding what to do.

Picking a Deployment Path

For most teams the answer is not one path; it is a layered combination.

StageTypical path
Initial design and explorationBlueprint Designer in a development account
Capturing the visual design as codeBlueprint export endpoint (HCL/JSON/YAML) or copy-paste from the designer
Solidifying the design into a reproducible artefactCLI solutions, Terraform, or SDK against the development account
Promoting to stagingSame solution / Terraform / SDK code, against the staging account
Promoting to productionSame code, against the production account, gated by review
Operational fixes and one-off tasksCLI imperative commands
Bootstrapping new sub-accounts (e.g. per-tenant)SDK using the Partner clients

The thing to avoid is a configuration that exists in production but does not exist as code anywhere. Once that happens, every future change is risky, because the only authoritative description of the agent is the agent itself.

A Worked Example: Three Environments, Hybrid Pattern

Putting it together. Suppose you are deploying a customer support agent for a SaaS product where each customer should be isolated from every other customer.

Account topology

  • Three master accounts: prod, staging, dev.
  • Inside each master account, one sub-account per customer (in prod and staging) or per developer (in dev).
  • Teams: a two-person prod-ops team on the production master account; an engineering team on staging; the whole engineering org on dev.

A useful naming convention is to keep resource aliases identical across environments and let the account itself supply the environment context. The support bot in dev, staging, and production are all aliased support-bot; the dataset is knowledge-base in every environment; the trigger integration is support-trigger everywhere. Code that references resources by alias works unchanged across environments, and there is no parallel hierarchy of support-bot-prod / support-bot-staging names to keep in sync. The environment is in the credential, not in the resource name.

Source of truth

A code-driven module describes the per-customer agent configuration: dataset, skillset, abilities, bot, integrations. The module takes inputs for the customer name, branding, and any per-tenant overrides. This can be a Terraform module, a CLI solution, or a parameterised SDK script - the choice comes down to which ecosystem your team already operates. The rest of the example assumes Terraform for concreteness, but every step would work the same with cbk solution sync or with an idempotent SDK deployment script.

Deployment flow

  1. An engineer prototypes the agent in the dev master account, in their personal sub-account, using the Blueprint Designer.
  2. Once the design stabilises, the engineer captures it as code - either by exporting the blueprint via /v1/blueprint/{blueprintId}/resource/export (with application/terraform+hcl, JSON, or YAML, depending on the chosen path) or by copy-pasting resources directly out of the designer - and translates it into the per-customer module.
  3. The engineer opens a pull request.
  4. CI applies the module against a fresh sub-account in the dev master account, runs evaluation tests, and tears it down.
  5. On merge, CI applies the module against the staging master account for the relevant set of staging tenants.
  6. After validation in staging, a separate workflow applies against the production master account, gated by manual approval from a member of prod-ops.

Sub-account provisioning

When a new customer signs up, an SDK script in the SaaS backend uses the Partner clients with the production master credential to create a new sub-account, then triggers a deployment (Terraform apply, cbk solution sync, or a follow-up SDK call) to seed that sub-account with the standard agent configuration.

This is one specific shape; it is not the only one that works. The shape it has comes directly from the choices in the previous sections: hybrid topology because of the tenant requirement, a code-driven module as system-of-record because the team wants reviewable deployments, and SDK for the dynamic per-customer provisioning that a static configuration alone cannot express cleanly.

From Dev to Production: Promotion, Testing, Rollback, and Audit

Once you have the topology and the deployment path nailed down, the next question is operational: how does a change actually move from a developer's environment all the way to production, and what happens when something goes wrong?

Promotion across environments

The promotion model with the platform is structurally the same regardless of which code-driven path you have chosen. The same configuration code (Terraform module, CLI solution, or SDK deployment script) is applied against a different account, with a different state file, in a controlled order.

Concretely:

  1. An engineer makes a change in the module that describes the agent (Terraform, CLI solution, or SDK script).
  2. CI applies the module against the dev master account (or a per-PR ephemeral sub-account underneath it). Tests run.
  3. On merge, CI applies the module against the staging master account. The state file for staging is independent from dev's (whether that state lives in a Terraform backend or in a .chatbotkit/ solution directory); only the configuration code is shared.
  4. After validation in staging, the same configuration code at the same git tag is applied against the production master account, with its own state file, gated by manual approval.

The thing being promoted is the configuration code at a known revision, not the staging state. Each environment has its own state file. Each environment ends up with its own resource IDs. Promotion is "apply the same code with the prod credential," not "copy state from staging into prod."

This is also where aliases pay off in practice. Because IDs differ across environments but aliases are stable, any code or resource that needs to reference another resource should do so by alias rather than by ID. The same Terraform module, the same SDK script, and the same downstream application code can then resolve support-bot correctly in dev, staging, and prod without any per-environment indirection.

A reasonable layout for this in a single repository:

Each envs/<env> directory has its own backend configuration pointing at a separate state file, and its own variable values for things that legitimately differ between environments (model choice, feature flags, integration endpoints, secret references). The module itself stays identical.

The same shape works with the SDK path: a single deployment script parameterised by environment, run against the appropriate sub-account or master account credential. The discipline is the same - promote the code at a known revision, not the runtime state.

Testing and evaluation gates

A deployment that compiles is not a deployment that works. Before code reaches production, the pipeline needs to answer two distinct questions:

Did the resources deploy correctly?

This is the boring half. terraform plan against the target environment should produce only the expected diff. Smoke tests against the deployed resources should confirm that bots respond, integrations are reachable, secrets are wired up, and datasets are populated. This catches configuration mistakes before they reach users.

Does the agent still behave correctly?

This is the harder half, and it is specific to AI deployments. The agent is non-deterministic. Resource-level tests will not catch a regression in agent behaviour. The pattern that works is to run the agent against a curated set of inputs and grade the outputs.

Two complementary approaches:

  • Golden conversations. A fixed set of inputs with known-good output ranges. Useful as a regression check. If a previously correct conversation now fails, the change broke something.
  • Agent-as-judge evaluators. A separate evaluator agent (often a stronger model) reads the new agent's output and grades it against a rubric. This is how you scale evaluation beyond the conversations you have manually labelled. Evaluators can grade tone, accuracy, tool use, refusal behaviour, or any other dimension you care about, and they emit pass/fail signals the pipeline can gate on.

In a hybrid topology, a clean place to run these is a fresh ephemeral sub-account spun up by CI for the change, deployed into via the same module, evaluated, then torn down. This keeps test runs isolated from human-occupied dev sub-accounts and gives each PR a clean baseline.

The promotion gate from staging to production should require the evaluator suite to pass at the staging tag. Without that gate, "it deployed" is the only signal you have, and that signal does not capture anything specific to AI behaviour.

Rollback

Rollbacks happen. The platform supports two complementary mechanisms.

Re-apply a previous revision. The cleanest rollback for a code-driven deployment is git checkout <previous-tag> in the production envs directory and re-apply. Because the configuration is the source of truth, the desired state of the account snaps back to whatever the previous tag described. This works for any change that did not also alter persistent runtime data.

Use the audit log to revert specific resources. Every change to a resource is captured in the platform's audit log, and the audit log preserves enough of the resource state at each revision that, in many cases, an individual resource can be reverted to a previous version directly. This is the right tool when only one or two resources need to be rolled back (a bot's backstory, a skillset's abilities, a dataset's contents) and the rest of the deployment is fine. It is also the right tool when the rollback target is older than the last code-driven deployment, since the audit log has full per-resource history independent of how the change was made.

The two mechanisms can be combined: revert the misbehaving resource through the audit log immediately to stop the bleeding, then make the corresponding fix in the configuration code and redeploy through the normal pipeline so that the fix is durable and not just a manual mitigation.

A few things worth getting right ahead of time:

  • Tag every successful production deployment in git, so "the previous good revision" is unambiguous.
  • Make sure your operators know how to read the audit log for a resource and revert from it. Practise once on a non-critical resource, not for the first time during an incident.
  • Avoid relying on dashboard-only "I'll just fix it" changes during a rollback. They work, but they create the same drift problem as any other manual change. Capture the fix in code as soon as the immediate incident is contained.

Audit log

Beyond rollback, the audit log is the answer to "who changed what, when, and from where" across all your accounts. Every resource mutation is captured, and the log is preserved independently of the resource's current state. Audit log entries can be exported, which is useful for regulated environments, post-incident reviews, or feeding compliance pipelines.

In a topology with one master account and many sub-accounts, the audit log is particularly important. The same master credential can act against any sub-account, which means the question "which deployment touched tenant B at 14:32 yesterday" cannot be answered by knowing which credential was used; it has to be answered from the audit log. Make sure the operators with master-account team membership understand that their actions are individually attributable, and budget time during incidents to consult the log rather than reconstructing what happened from memory.

Cost allocation across sub-accounts

In a master-account-with-sub-accounts topology, billing rolls up to the master. The master account holds the bill for usage across all of its sub-accounts. This is part of why the topology is so convenient operationally: one billing relationship, one credit pool, regardless of how many sub-accounts exist underneath.

The platform also lets you allocate a slice of the master account's capacity to each sub-account by setting per-sub-account limits, most notably token limits. A sub-account with a 1 million token limit can consume up to that much; once it hits the limit, it stops, and the rest of the master account's capacity is unaffected.

One subtlety worth understanding: limits are not pre-allocated. If a master account has 2 million tokens of capacity and you set a 1 million token limit on each of five sub-accounts, the platform does not reserve 5 million tokens (which it does not have). Each sub-account can consume up to its own limit, and the master account can run out before any individual sub-account does. This is intentional - it lets you over-commit limits the way airlines over-book seats - but it means a per-sub-account limit is a ceiling on that sub-account's usage, not a guarantee of capacity.

The practical implications:

  • Use sub-account limits to prevent a single sub-account from consuming everything. They are an isolation tool, not a budgeting tool.
  • For genuine budgeting, monitor master-account usage in aggregate. Per-sub-account usage is exposed through the API and can be exported for charge-back or attribution to internal teams or external customers.
  • If a particular sub-account must be guaranteed capacity (a paying tenant with an SLA), the safest answer is to give it its own master account and bill it separately. That is one of the strongest reasons to choose Pattern A or hybrid over Pattern B.

A related question that comes up with high-fan-out designs (e.g. driving a deployment across hundreds of sub-accounts in parallel) is at which level the platform's API rate limits apply - per master, per sub-account, or per token. Limits do exist, and they affect the parallelism you can sustain. If you are planning a fan-out workload, confirm the current limits with the ChatBotKit team before designing around an assumed parallelism, and back-pressure or batch the deployment loop rather than firing all sub-accounts simultaneously.

What Code-Based Deployment Does and Does Not Cover

A code-driven pipeline reproduces configuration. The thing it does not reproduce is runtime data. The distinction matters because teams sometimes assume "I deployed from the same code with the same parameters, therefore the environments are identical." They are identical at the configuration level. They are not identical at the data level.

Configuration vs. runtime data

Configuration is what the deployment paths in this guide control: bots, datasets (the metadata of them), skillsets, abilities, integrations, blueprints, portals, secrets, and the wiring between them. Re-running the same Terraform module, cbk solution sync, or SDK script against a fresh sub-account will recreate all of this faithfully.

Runtime data is the everything-else that accumulates inside an account as it is used. It is not part of any code-based deployment by default, and a fresh deployment into an empty sub-account will not have any of it:

  • Conversations. Every interaction your bots have had - the message history, tool calls, responses. Production has months of conversations; a fresh staging environment has none.
  • Memories. Whatever the bot has remembered about users, tasks, or context across conversations.
  • User-defined tasks. Tasks created by users or by the agent itself during operation.
  • Contacts. The contact records that build up as users interact with the agent through any of the integration channels.
  • Space content. Content populated inside spaces during operation. The exception is content that is populated by scripts as part of the deployment - that is configuration in disguise and will be re-created on every deploy.

The implication is that "promote staging to prod" is a configuration operation, not a data operation. If you need to seed a new environment with realistic data (for example, copying a snapshot of production conversations into staging for evaluation purposes), that is a separate workflow with its own privacy and compliance considerations.

This also means a freshly-deployed environment will look healthy but quiet. Any agent behaviour that depends on accumulated history (long-term memory, contact-based personalisation, ongoing user tasks) will appear different in a brand-new deployment, regardless of how perfectly the configuration was promoted.

Per-environment webhooks and outbound endpoints

Many integrations point at outbound URLs: trigger integrations call back to your services, MCP servers reach out to a hosted endpoint, fetch and search abilities may target environment-specific APIs. These URLs legitimately differ across environments - staging should not be hammering your production webhook receiver, and production should not be calling a developer's ngrok tunnel.

The clean shape is to treat outbound URLs as per-environment variables, the same way you would treat a database connection string in any other infrastructure: parameterise the module or solution input, set a different value per environment, and let promotion vary the URL while keeping the rest of the configuration identical. Webhooks can be configured per environment for exactly this reason.

A specific gotcha worth flagging: when you copy-paste a resource out of the Blueprint Designer, or export a blueprint to JSON / YAML / HCL, the exported configuration captures whatever URL was set at authoring time. Treat that URL as a placeholder and replace it with the per-environment variable in the captured artefact, otherwise dev URLs end up in production at promotion time.

Backups and disaster recovery

The platform maintains its own backups, so day-to-day data protection is handled. Beyond that, anything you can read through the API can be backed up by a custom script: blueprint export for the configuration side, partner APIs to enumerate sub-accounts, the resource APIs to dump per-resource state, and the audit log for change history. Many teams that operate sensitive workloads run a periodic export job that snapshots the configuration of each account into version control, independently of any specific deployment.

Where this becomes genuinely complicated is consistent point-in-time backups across multiple resources. A naive script that dumps each resource in turn will see slightly different points in time across resources, which is fine for most cases and not fine when you need a single coherent snapshot to restore from. Cross-resource consistency, ordering of restores (datasets before bots that reference them, skillsets before abilities, secrets before integrations), and recovery testing are all areas where custom scripts can get subtle.

If you are designing a backup strategy that needs to be more than "best-effort export of each resource periodically," reach out to the ChatBotKit team. We can advise on what guarantees the platform-side backups offer, what additional snapshots make sense for your use case, and how to structure the restore path so that recovering from a backup actually produces a working environment. Designing this in advance is much cheaper than discovering its limitations during an incident.

Secrets and Credentials

A few specific things to get right regardless of which path you pick.

Credential scoping. A master account's API secret is a high-privilege credential. It can act as any sub-account underneath it. Treat it as you would an AWS root credential. Do not put it on engineer laptops if you can avoid it; issue separate credentials for development environments, and reserve the master credential for CI and operational workflows.

Run-as discipline. When using the master credential, always set the run-as identifier explicitly. A common failure mode is running a command intended for a sub-account against the master account itself because the run-as variable was unset. Defensive scripting that fails fast on a missing run-as identifier in non-interactive contexts will save you from a class of incidents.

Per-environment secrets. Integration credentials (Slack tokens, third-party API keys) belong in the platform's secret resources, scoped to the account they apply to. A staging Slack workspace should have its own token; a production Slack workspace should have its own. Sharing credentials across environments means a staging mistake can affect production users.

Credential rotation. Build rotation in from the start. The platform supports issuing new secrets and revoking old ones; whichever deployment path you use, make sure you can swap a credential without redeploying every agent.

Common Pitfalls

A few patterns that tend to cause problems.

Treating the Blueprint Designer as the source of truth in production. The visual configuration drifts from any code description over time, and once that drift exists, recovering from a mistake means reconstructing the agent by hand. Move to a code-based path before this matters.

Single master account with no sub-accounts. Putting development, staging, and production resources all into the root of a single master account, with naming conventions to keep them apart, removes every isolation guarantee the platform offers. A misconfigured deployment script will happily delete production resources whose names happen to match.

Sub-accounts as the only isolation, when you also need billing separation. Sub-accounts inside a single master account roll up to the same bill. If finance needs production and development on different invoices, sub-accounts alone do not give you that.

Manual changes in production. Even with code-based deployment, an "I'll just fix it in the dashboard" change creates drift that the next deployment will either silently overwrite or, worse, get confused by. Make production read-only by convention; require all changes to go through the pipeline.

Forgetting that sub-accounts do not nest. Designing an architecture that assumes three levels of isolation (organisation, environment, tenant) inside a single master account will hit the wall halfway through implementation. Catch this at design time by drawing the topology before writing any code.

Orphaned resources from failed deployments. A deployment that fails halfway through a long-running CLI script or a partial SDK run can leave resources behind that the next attempt does not know about. State-tracked paths (Terraform and CLI solutions) handle this naturally because their state files record what they own and what to clean up. Imperative scripts need to be written defensively: idempotent create-or-update calls keyed on alias, so a re-run reattaches to existing resources rather than duplicating them, and a periodic sweep that uses the partner-client delete operations to remove sub-accounts and resources that no longer correspond to anything in the source of truth.

For a new team that has not yet committed to a topology:

  1. Create three master accounts: prod, staging, dev. This gives you a clean baseline with strong isolation between environments.
  2. Attach minimal teams. Two people on production. Engineering on staging. Engineering and friends on dev. New engineers join the dev team first, are added to staging once they are trusted to validate changes, and are added to the production team only when they take operational responsibility for it; nobody starts with production access on day one.
  3. Prototype agents in dev using the Blueprint Designer. Iterate freely.
  4. As soon as an agent is going to serve real users, capture its configuration as code - using the blueprint export endpoint, copy-paste from the designer, or by hand - and adopt that captured form (a CLI solution, a Terraform module, or an SDK script) as the source of truth.
  5. Use the SDK with Partner clients for any case where you need to create sub-accounts on the fly, such as per-tenant isolation in production.
  6. Reserve the CLI for operational work and quick scripts.

This is the path of least surprise. It mirrors widely-known cloud patterns, gives you reversibility through code, and leaves the door open for the hybrid pattern when the second axis of isolation (tenants, customers, business units) becomes necessary.

Summary

ChatBotKit deployments are built from two primitives (master accounts and sub-accounts) and gated by a third (teams). Those primitives compose into either a multi-master-account layout, a single-master-account layout with sub-accounts as environments (close in shape to AWS Organizations or GCP organisations and projects), or a hybrid that gives you both axes of isolation. On top of that, the deployment surface covers a full spectrum: the Blueprint Designer for visual prototyping (with first-class export to JSON, YAML, or HCL via copy-paste or the blueprint export endpoint), the CLI for both ad-hoc imperative commands and declarative state-tracked deployments through cbk solution, the Node and Go SDKs for procedural and idempotent deployment logic, and the Terraform provider for teams that prefer HCL and the wider Terraform ecosystem. None of these paths is privileged; you can mix and match them, and most production teams do.

The right deployment is the one where every piece of configuration that runs in production also exists somewhere you can review, diff, and rebuild from. Pick the topology deliberately, choose the deployment path that matches your team's existing operational habits, and reserve the visual tools for the part of the lifecycle where they are genuinely faster: the early, exploratory work where the agent's shape is still being discovered.