back to guides

AI Agent Security: Authentication, Tool Access, and Defense in Depth

A comprehensive guide to securing AI agents, covering credential management, tool exposure, model selection, prompt hardening, and the trade-offs between different authentication and tooling architectures.

Overview

AI agents are powerful because they act. They don't just generate text, they call APIs, read databases, send messages, execute code, and interact with external systems on behalf of users. Every one of those capabilities is also a potential attack surface.

Securing an AI agent is fundamentally different from securing a traditional application. Traditional apps have deterministic control flow where code paths are fixed, inputs are validated against known schemas, and access boundaries are enforced by the developer at every step. AI agents, by contrast, make decisions at runtime. The model decides which tools to call, with what parameters, in what order. This means security must be built into the architecture, not bolted on as input validation.

This guide covers the key dimensions of AI agent security, such as how credentials are managed, how tools are exposed, how models and system prompts affect the security posture, and the trade-offs between different architectural approaches to each. The goal is to help you make informed decisions about how to build agents that are both capable and safe.

Key Concepts

The Agent Threat Model

An AI agent's attack surface is broader than a typical application because the model itself is an untrusted component in the system. It processes user input, which may contain adversarial content, and produces tool calls that have real-world effects. The core threats are:

  • Prompt injection: Malicious input that causes the agent to take unintended actions, calling tools it shouldn't, leaking data, or ignoring its instructions.
  • Credential exposure: Secrets leaking into model context, conversation logs, or error messages.
  • Over-permissioned tools: Agents with access to more capabilities than their task requires.
  • Uncontrolled execution: Code execution or API calls that escape intended boundaries.
  • Data exfiltration: The model being tricked into sending sensitive data to attacker-controlled endpoints through tool calls.

Every architectural decision, from model selection to tool exposure to credential management, should be evaluated against these threats.

Defense in Depth

No single security measure is sufficient. Effective agent security layers multiple controls:

  1. System prompt hardening - Constrain the agent's behavior through clear, specific instructions.
  2. Model selection - Choose models with strong instruction-following and safety characteristics.
  3. Credential isolation - Keep secrets out of the model's context entirely.
  4. Tool scoping - Expose only the specific capabilities each agent needs.
  5. Execution sandboxing - Run code and API calls in isolated, ephemeral environments.
  6. Access control - Enforce permissions at the platform level, not in the prompt.

The remainder of this guide examines each layer in detail.

System Prompt and Backstory

The system prompt, referred to as the backstory in ChatBotKit, is the first line of defense. It defines the agent's identity, role, constraints, and behavioral boundaries. A well-crafted backstory doesn't just shape personality. It establishes what the agent should and should not do.

Security-Relevant Backstory Practices

Be explicit about boundaries. Vague instructions like "be helpful" give the model latitude to interpret requests broadly. Specific constraints are harder to override:

Separate identity from capabilities. The backstory defines who the agent is. Tool permissions define what it can do. Don't rely on the backstory to enforce tool restrictions, as models can be persuaded to ignore prompt-level instructions. Use platform-level controls for hard boundaries.

Avoid including sensitive data in the backstory. The system prompt is part of the model's context window. Anything in it is accessible to the model, and potentially extractable through clever prompting. Never put API keys, internal URLs, database credentials, or customer data in the backstory.

Design for adversarial input. Assume users will attempt prompt injection. Structure the backstory to resist instruction override by anchoring the agent's identity firmly and including explicit refusal patterns for out-of-scope requests.

Backstory vs. Tool-Level Instructions

In ChatBotKit, each ability carries its own instruction set that guides the model on how and when to use it. This creates a layered instruction architecture:

  • Backstory → Defines overall identity, role, and behavioral boundaries
  • Skillset descriptions → Help the model understand tool categories and selection
  • Ability description → Define specific tool usage patterns and constraints

This separation is valuable for security because it means tool-specific constraints live close to the tool definition, not in a monolithic system prompt that's harder to maintain and audit.

Model Selection

The choice of language model directly affects your agent's security posture. Models differ in their susceptibility to prompt injection, their instruction-following reliability, and their tendency to hallucinate tool calls.

Security Considerations for Model Choice

Instruction-following fidelity. Models that reliably follow system prompt constraints are inherently more secure. A model that occasionally ignores a "do not reveal internal information" instruction is a liability. Larger, more capable models generally follow instructions more faithfully, but this isn't universal.

Tool call accuracy. When an agent uses tools, the model generates structured function calls. Models with weaker function-calling capabilities may produce malformed calls, call tools with unintended parameters, or call tools at inappropriate times. Poor tool-call accuracy is a security concern, not just a reliability one.

Reasoning capabilities. Models with reasoning features tend to consider instructions more carefully before acting. This can reduce impulsive tool calls in response to adversarial prompts, but it also increases latency and cost.

Context window size. Larger context windows allow more comprehensive system prompts and more conversation history, which helps the model maintain consistent behavior across long interactions. However, more context also means more surface area for injection attacks embedded in earlier messages.

Practical Guidance

  • Use models with strong function-calling capabilities for tool-heavy agents.
  • Prefer models that demonstrate robust instruction following in adversarial evaluations.
  • Consider the trade-off between capability and attack surface, as a more capable model given fewer tools may be more secure than a less capable model given many tools.
  • Set temperature: 0 (the ChatBotKit default) for agents handling sensitive operations. Deterministic output reduces unexpected behavior.

Authentication and Credential Management

How credentials are stored, accessed, and injected into tool calls is arguably the most critical security decision in agent architecture. There are three broad approaches in the current ecosystem, each with distinct trade-offs.

Approach 1: Credentials Stored Alongside Skill Definitions

This is the pattern used by frameworks like Claude Code's SKILLS.md approach, where credentials (API keys, tokens) are stored in configuration files alongside the skill or tool definitions. The agent reads these credentials directly when executing tool calls.

How it works

  • API keys and tokens live in configuration files, environment variables, or .env files adjacent to tool definitions.
  • The agent (or its runtime) reads credentials directly when making API calls.
  • A single set of credentials is used for all invocations.

Pros

  • Simple to set up, as credentials are co-located with the code that uses them.
  • Low operational overhead, with no separate credential management infrastructure.
  • Fast iteration during development.

Cons

  • Credential exposure risk is high. Credentials stored in files are accessible to any process in the same environment. If the agent can read its own configuration, prompt injection could potentially extract credentials through tool calls or conversation output.
  • No per-user isolation. All users share the same credentials. A compromised credential affects every user.
  • No audit trail. There's typically no logging of which user triggered which credential usage.
  • Credential rotation is manual. Updating a key means updating configuration files and redeploying.
  • Secrets may leak into version control. Despite best practices, credentials in config files frequently end up in git repositories.

Verdict: Acceptable for local development and single-user agents. Risky for production multi-user systems where credential isolation and auditability matter.

Approach 2: MCP with Per-User Authentication

The Model Context Protocol (MCP) approach moves authentication to the MCP server layer. Each user authenticates individually against external services, and the MCP server manages credentials per session.

How it works

  • The AI agent connects to an MCP server endpoint.
  • The MCP server discovers and exposes available tools.
  • Each user authenticates with external services through the MCP server (often via OAuth).
  • Tool calls are executed within the authenticated user's context.

Pros

  • Per-user credential isolation. Each user authenticates independently. One compromised session doesn't affect others.
  • Credentials never touch the model. The MCP server handles authentication. The model only sees tool schemas and results.
  • Standard authentication flows. OAuth 2.0 and other standard auth protocols are supported.
  • Broad tool discovery. MCP servers can expose large tool sets dynamically.

Cons

  • Wide tool surface area. MCP servers often expose many tools, sometimes dozens or hundreds per connected service. The model can potentially call any of them, which increases the blast radius of a prompt injection attack.
  • Tool discovery is dynamic. The available tool set can change between sessions, making it harder to audit what an agent can do at any given time.
  • Authentication infrastructure overhead. Running and securing MCP servers adds operational complexity.
  • Trust boundary complexity. You're trusting the MCP server implementation to correctly enforce permissions and isolate user contexts.

Verdict: Strong per-user security model, but the broad tool exposure can be a concern. Best when combined with tool filtering or approval mechanisms.

Approach 3: ChatBotKit Secrets (Shared and Personal)

ChatBotKit takes a hybrid approach with two credential types that address different use cases:

Shared Secrets - Service-level credentials authenticated once by the developer. A single API key or token is used for all agent interactions.

  • Best for: Service accounts, internal APIs, platform integrations where per-user auth isn't needed.
  • The credential is encrypted at rest, injected at runtime via token replacement, and never exposed to the model.

Personal Secrets - Per-user credentials where each user authenticates individually (typically via OAuth 2.0).

  • Best for: User-facing integrations where the agent acts on behalf of the specific user, reading their email, accessing their calendar, posting to their Slack.
  • Full OAuth flow with authorization, token refresh, and revocation.
  • Token isolation ensures one user's compromised credential doesn't affect others.

How both types work at runtime

The model never sees the actual credential. It sees a placeholder like ${MY_SECRET} in the instruction template. At execution time, the platform resolves the placeholder against the encrypted credential store and injects the real value into the API call. This means:

  • Prompt injection cannot extract credentials, as the model literally doesn't have them.
  • Credentials are encrypted at rest using industry-standard encryption.
  • Token isolation prevents cross-user credential leakage.
  • Automatic token refresh handles OAuth lifecycle without developer intervention.
  • Audit logging tracks authentication events and credential usage.

Pros

  • Strongest credential isolation. Secrets are never in the model's context window.
  • Flexibility. Shared secrets for service integrations, personal secrets for per-user access.
  • Managed lifecycle. Token refresh, revocation, and rotation are handled by the platform.
  • Auditability. Credential usage is logged and traceable.

Cons

  • Platform dependency. Credential management is tied to the ChatBotKit platform.
  • Setup overhead. OAuth configurations require setting up client IDs, secrets, scopes, and callback URLs.

Verdict: The most secure approach for production multi-user agents. The platform handles the hard parts of credential management so the agent architecture stays clean.

Comparison Summary

DimensionSkills/Config FilesMCP Per-User AuthChatBotKit Secrets
Credential isolationNone (shared)Per-userShared or per-user
Model exposurePossibleNoNo
Setup complexityLowMediumMedium
Audit trailNoneVariesBuilt-in
Token lifecycleManualVariesAutomatic
Multi-user safetyPoorGoodGood
Production readinessDevelopment onlyGoodStrong

Tool Access and Execution

The tools available to an agent define the boundary of what damage a compromised or misdirected agent can cause. There are three architectural approaches, each offering a different trade-off between capability and control.

Spectrum of Tool Exposure

Think of tool access as a spectrum from maximum flexibility (and risk) to maximum control (and constraint):

Native Tools: Maximum Flexibility, Hardest to Secure

When an agent has access to its own execution environment, including shell commands, file system access, build scripts, and package managers, it has essentially the same privileges as the developer or process running it.

This is the model used by coding agents like Claude Code, Cursor, and similar tools. The agent can run arbitrary commands, read and write files, install packages, and interact with the operating system.

Why it's hard to secure

  • Unbounded action space. The agent can do anything the underlying process can do. There's no pre-defined list of allowed operations.
  • No parameter validation. Shell commands are free-form strings. There's no schema to validate against.
  • Lateral movement. File system access can lead to credential theft, environment variable exfiltration, and access to other services.
  • Persistence. The agent can modify its own instructions, install backdoors, or alter build scripts.
  • Audit complexity. Logging every shell command with full context is expensive and noisy.

When it's acceptable

  • Local development environments where the agent operates under direct developer supervision.
  • CI/CD pipelines with strict sandboxing and ephemeral environments.
  • Single-user, single-purpose agents where the user and the agent operator are the same person.

When it's not

  • Multi-user production systems.
  • Agents handling sensitive data or acting on behalf of others.
  • Any context where prompt injection could lead to unauthorized system access.

MCP Tools: Broad Arsenal, Some Control

MCP provides a standardized way for agents to discover and call tools exposed by external servers. This is inherently more controlled than native tool access because tools are defined with schemas and executed by the MCP server rather than locally.

Advantages over native tools

  • Tools have defined schemas, including names, parameters, types, and descriptions.
  • Execution happens on the MCP server, not in the agent's local environment.
  • Authentication is handled per-connection, not per-command.
  • Tool discovery is explicit, allowing the agent to see what's available.

Remaining concerns

  • Tool set size. A single MCP server integration (e.g., GitHub, Google Workspace) may expose dozens of tools. The agent can call any of them, including destructive operations like deleting repositories or sending emails.
  • Dynamic tool sets. The available tools can change between MCP server versions or configurations. What was a read-only tool set yesterday may include write operations today.
  • Prompt injection amplification. The more tools available, the more damage a successful prompt injection can cause. An agent with access to email, calendar, file storage, and messaging can exfiltrate data through any of those channels.
  • No per-tool approval. Most MCP implementations don't support approving individual tool calls at runtime (though some clients are adding this).

Mitigation strategies

  • Connect only the MCP servers your agent actually needs.
  • Use MCP server configurations that limit exposed tools where supported.
  • Implement monitoring and alerting on tool call patterns.
  • Consider human-in-the-loop for high-impact tool categories.

ChatBotKit Abilities: Scoped, Specific, Validated

ChatBotKit abilities represent the most controlled end of the spectrum. Each ability exposes a specific action for a specific service with defined parameters, not an entire API surface.

How abilities provide tighter control

Instead of exposing a generic "GitHub API" tool that can call any endpoint, ChatBotKit abilities expose individual operations:

  • github/issues/list - List issues in a repository
  • github/issues/create - Create a new issue
  • github/pull-requests/list - List pull requests

Each ability defines exactly:

  • Which API endpoint it calls
  • What parameters the model can provide (with types and descriptions)
  • What authentication is required (bound to a named secret)
  • How the response is transformed before returning to the model

Why this matters for security

  • Minimum viable capability. You give the agent exactly the tools it needs, such as read issues but not delete repositories.
  • Parameter validation. Each parameter has a defined type and description. The platform validates inputs before making the API call.
  • No arbitrary endpoint access. The agent can't construct novel API calls, as it can only invoke predefined ability templates.
  • Secret binding. Each ability is bound to a specific secret. The agent can't use Slack credentials to call GitHub APIs.
  • Auditable by design. Every ability invocation is logged with the ability name, parameters, and result.

Trade-off

The obvious trade-off is flexibility. If you need an API operation that doesn't have a pre-built ability, you need to create one (or use a more flexible approach for that specific integration). This is a feature, not a bug, as it forces explicit decisions about what each agent can do.

Tool Access Comparison

DimensionNative ToolsMCP ToolsChatBotKit Abilities
Action scopeUnboundedBroad per serverPer-action
Parameter validationNoneSchema-basedSchema + type validation
Execution environmentLocal processRemote serverPlatform-managed
Credential accessEnvironment accessPer-connectionPer-ability secret binding
Audit granularityLowMediumHigh
Setup effortNoneMediumLow (100+ pre-built)
ExtensibilityUnlimitedServer-dependentTemplate-based

Secure Code Execution

When an agent genuinely needs to execute code, such as data processing, calculations, or file transformations, sandboxed execution is essential. ChatBotKit provides isolated, ephemeral containers for code execution:

  • Multi-language support: Python, JavaScript, shell scripts, etc.
  • Ephemeral containers: No persistent state between executions.
  • No infrastructure access: Executed code cannot reach internal services or databases.
  • Workspace integration: Can access Space files for processing, but nothing beyond the defined workspace.

This is categorically different from giving an agent shell access in the host environment. The sandbox constrains the blast radius of any code the model generates-whether it's intentional functionality or the result of a prompt injection.

Access Control and Authorization

Platform-level access control is the hardest security layer for an attacker to circumvent because it's enforced outside the model's context entirely.

Principle of Least Privilege

Every agent should have access to only the tools, data, and credentials necessary for its specific purpose. In practice this means:

  • Scope abilities narrowly. Don't give a customer support agent abilities to modify billing records just because those APIs exist.
  • Use separate secrets per integration. A single shared API key for "everything" is a single point of compromise.
  • Segment agents by function. In multi-agent systems, each agent should have its own ability set. A research agent doesn't need the action agent's tools.
  • Prefer personal secrets for user-facing actions. When an agent acts on behalf of a user, use per-user OAuth so the action is bound to that user's permissions, not a service account with broad access.

Scoping via Skillsets

ChatBotKit skillsets group related abilities together and attach them to specific agents. This provides a natural organizational boundary:

  • A "Customer Support" skillset might include knowledge base search, ticket creation, and order lookup, but not invoice generation or account deletion.
  • A "Content Assistant" skillset might include document search and draft generation, but not email sending or calendar management.
  • Skillset names and descriptions directly influence model behavior, making the agent more likely to use tools appropriately when they're clearly categorized.

Multi-Agent Authorization

In blueprint architectures with multiple agents, each agent in the workflow should have its own authorization boundary:

  • Intake agents need conversational abilities but minimal tool access.
  • Research agents need read-only access to knowledge bases and search.
  • Execution agents need write access to specific systems, and should be the only agents with it.
  • Review agents need read access to validate what execution agents did.

This mirrors the principle of separation of duties in traditional security, where no single agent has unchecked authority across the entire system.

Best Practices

Credential Management

  • Never store credentials in system prompts, backstories, or conversation context.
  • Use platform-managed secret injection where credentials are resolved at runtime and never enter the model's context window.
  • Prefer per-user authentication (personal secrets / OAuth) for user-facing actions.
  • Rotate shared secrets on a regular schedule and after any suspected compromise.

Tool Exposure

  • Start with the narrowest tool set that enables your use case. Expand deliberately.
  • Prefer scoped abilities over broad API access for production agents.
  • Audit your agent's tool set regularly, and remove abilities that are no longer needed.
  • For high-impact operations (deleting data, sending communications, financial transactions), consider requiring human approval.

Model and Prompt Hygiene

  • Set temperature to 0 for agents performing sensitive operations.
  • Write system prompts that explicitly define boundaries, not just goals.
  • Test your agent against adversarial prompts before deploying to production.
  • Keep sensitive data out of the model's context, and use platform-level data access instead of injecting data into prompts.

Monitoring and Response

  • Log all tool invocations with full context (who, what, when, parameters, result).
  • Set up alerting for unusual tool call patterns, including sudden spikes, unexpected tool combinations, and after-hours activity.
  • Define an incident response plan for when an agent takes unintended actions.
  • Review conversation logs regularly to identify prompt injection attempts and near-misses.