Centralized Agents, Distributed Runtimes: Deploying at Scale
Overview
There are two ways to build an AI agent. You can define everything in your own code - the model, the prompt, the tools, the knowledge retrieval, the guardrails - and ship a self-contained program. Or you can split the agent in two: keep the core on the platform, where it is built, scoped, and monitored centrally, and run a thin runtime outside the platform that does nothing but execute against that core.
This guide is about the second approach, and why it is the one that scales. The agent's mechanics - its backstory, its model choice, the datasets it draws on, the skillsets and abilities it can call, the secrets it holds, the policies that bound it - live inside ChatBotKit as a configured bot. The runtime is a small program, built here with the Go SDK, that targets that bot by its botId and drives it. Because the runtime carries no agent logic of its own, it stays small, and because every behavioral decision lives centrally, changing the bot changes every runtime that points at it.
The payoff is deployment at scale. When the runtime is a soft layer over a centrally-managed core, you can stand up hundreds of agents with almost no per-agent code, monitor all of them through one pane of glass, and update their behavior in a single place. This guide covers how the split works, what lives on each side, how the runtime targets a central agent, how the platform captures every action for monitoring and drift analysis, and where the pattern fits.
This is a different cut of deployment than the agent deployment guide, which covers environments, accounts, and infrastructure-as-code pipelines. Read this one for the core-versus-runtime split; read that one for how to provision and promote across environments.
The Pattern
The core idea is a clean separation between the agent's definition and its execution.
Every runtime references the same central agent by its botId. None of them carries the agent's logic. The core is defined once; the runtimes are interchangeable shells that bring it to wherever it needs to run - a server, a serverless function, an edge device, a job queue worker.
What Lives Where
The split is only useful if the boundary is clear. The platform owns everything about what the agent is and what it is allowed to do. The runtime owns everything about where and when it runs.
| Concern | Lives in the platform (the core) | Lives in the runtime (the shell) |
|---|---|---|
| Identity and behavior | Backstory, instructions, model selection | - |
| Knowledge | Datasets and records the agent retrieves from | - |
| Capabilities | Skillsets, abilities, MCP and integration access | Optional local tool handlers for its own environment |
| Credentials | Secrets and integration credentials, centrally scoped | Its own API key for reaching the platform |
| Guardrails | Usage and retention policies, limits | - |
| Execution | - | The host environment, the loop, input/output, scheduling |
| Observability | Conversations, events, usage, metrics | - |
Three consequences follow from this table. First, the agent's access is scoped centrally - what knowledge it can read, which tools it can call, which secrets it holds are all decided on the platform and apply uniformly to every runtime, so a thin shell on an untrusted host never holds the keys to anything it should not. Second, the runtime can still contribute local capability: it can supply tool handlers that execute in its own environment - reaching a private system the platform cannot see - while the agent's reasoning, knowledge, and policy stay central. The platform supplies the core capabilities; the runtime can supply hands where the work physically happens. Third, the agent is monitored centrally: because the runtime drives the agent through the platform, every interaction, tool call, and token is recorded where you can see it, regardless of which host produced it - observability that the runtime gets for free and the next section covers in full.
The runtime carries one credential, not many. A self-contained agent that talks to third-party systems has to hold every credential those systems require - API keys, OAuth tokens, database passwords - wherever it runs. In this pattern those credentials never leave the platform. The agent's third-party access is configured centrally through secrets and integrations, and the runtime authenticates only to the agent itself. Its one embedded credential is a token scoped to a single capability - creating sessions for that specific bot - so the token issued by ChatBotKit allows that one bot to run and nothing else: no reading data, no reaching other bots, no other account operations. When the agent needs to reach an external service, that call is made from the platform side using the centrally-held secret; the runtime simply asks the agent to act. The security payoff is direct: a compromised runtime - on customer infrastructure, at the edge, on a host you do not fully trust - leaks only that narrowly-scoped mint token, which can do nothing but start the same agent it was already running, and is revocable in one place. It exposes none of the downstream credentials, because it never had them. This complements the layered access controls in the AI agent security guide.
Targeting a Central Agent from the Runtime
This is the mechanic that makes the whole pattern work. The runtime opens a short-lived session against the central agent by its botId, then drives that agent through the Go agent SDK over the conversation the session opens. The bot's backstory, model, knowledge, skillsets, and policies govern the run; the runtime specifies none of them. The session also hands back a scoped, expiring token, so the one credential the runtime works with is bound to this single agent.
Notice what the runtime does not specify: no model, no backstory, no knowledge, no configuration. Those belong to the bot. The runtime references the agent by botId only to open the session, then supplies just the conversation, the input, and any local tools it chooses to contribute; everything else is fetched from the platform at execution time. The behavior is data the platform owns, not code the runtime ships - which is why changing the bot changes every deployment, with nothing to rebuild or redeploy.
The runtime should hold a reference, never a definition. The moment a runtime starts carrying backstory, model choice, or retrieval logic of its own, it stops being a thin shell and the central agent stops being the single source of truth. Keep the runtime ignorant of what the agent is. Its only job is to bring it the input and carry away the output.
The Activity Log Is Captured Centrally
Because the runtime drives the agent through the platform rather than calling a model provider directly, every action the agent takes is recorded where you can see it. A stateful conversation tied to a bot produces a complete, durable record: the messages exchanged, the tools and abilities invoked, the tokens consumed, and the events emitted along the way. None of that lives on the runtime host; it all lands centrally.
This is what turns a fleet of distributed runtimes into something you can actually operate:
- Monitoring - bot usage statistics and event metrics report what each agent is doing and consuming, per bot, across every runtime that targets it.
- Drift analysis - because the full conversation log is retained centrally, you can review how behavior changes over time, catch regressions after a model or prompt change, and spot agents that have started answering differently than they should.
- Auditing - every interaction and tool call is part of an audit trail suitable for security and compliance review, regardless of which host produced it.
- Measurement - the same captured activity feeds ROI measurement; see the measuring ROI guide for turning that log into value figures.
A runtime running on an edge device or a third-party server gives up none of this. The execution is distributed; the observability is centralized.
The Soft-Layer Benefit: Change Once, Propagate Everywhere
The single most valuable property of this pattern is that the runtime is a soft layer over the core. The runtime is durable and rarely changes; the behavior is fluid and changes centrally.
Improve the backstory, switch to a cheaper model after the agent is proven (as the cost control guide recommends), add a dataset, tighten a policy - and every runtime pointing at that bot adopts the change on its next interaction. No redeploy, no version skew across the fleet, no coordinating a rollout over hundreds of hosts. The behavior is managed the way you manage data, not the way you manage binaries.
This also cleans up the failure modes that plague fleets of self-contained agents. There is no drift between an instance running last month's prompt and one running this month's, because none of them carries a prompt. There is no scramble to patch every host when a guardrail needs tightening, because the guardrail lives in one place.
Scaling to Hundreds of Agents
Put the pieces together and large-scale deployment becomes almost mechanical. Each deployed agent is a thin runtime - a botId, a key, and the small amount of glue that connects it to its input - so adding the hundredth agent costs about as much code as adding the first.
A few platform features compound the effect:
- Sub-accounts for isolation. Run each tenant's or each workload's agents inside its own sub-account, with its own scoped limits. The cost control guide covers how a small token budget on a sub-account caps a risky agent's blast radius. The Go SDK's
RunAsUserIDoption lets a single partner-level runtime act on behalf of a specific sub-account, so one piece of runtime code can serve many isolated tenants. - Blueprints for replication. When many agents share a shape, define them from a common blueprint so the central definitions stay consistent and a change to the template flows through to the agents built from it.
- One observability surface. However many runtimes you stand up, their activity converges into the same usage statistics, metrics, and event logs, so monitoring a hundred agents is the same exercise as monitoring one.
The result is the goal the pattern is named for: agents deployed at absolute scale, with minimal per-agent code, all driven by central agents you manage in one place.
When This Pattern Fits
The centralized-core, thin-runtime split is the right default when agents must run somewhere the platform cannot host them - on customer infrastructure, at the edge, inside an existing service, or behind a private network - while you still want central control and observability. It shines when you operate many agents that share most of their behavior, when behavior needs to evolve frequently without redeploying fleets, and when uniform monitoring and policy across every deployment matter.
It fits less well when an agent's logic is genuinely unique to its host and shares nothing with others, or when the runtime must operate fully disconnected from the platform for long stretches, since the core and the activity log both assume the runtime can reach ChatBotKit. For those cases an inline, self-contained agent is the simpler choice.
For the common case - many agents, evolving behavior, distributed execution, central oversight - keeping the core on the platform and the runtime thin is what lets the fleet grow without the operational cost growing with it.