Proactive Slack Incident Responder

A dual-agent architecture where a Monitor Agent detects incidents and proactively initiates Slack DMs with on-call engineers, while a Response Agent handles the ongoing conversation, provides context, and coordinates resolution.

slack outreach
multi-agent
proactive communication
2349

The Proactive Slack Incident Responder demonstrates how AI agents can autonomously detect problems and reach out to the right people at the right time. Instead of waiting for engineers to notice alerts or check dashboards, this system brings critical information directly to them via Slack DM—and then helps them through the resolution process.

The Monitor Agent is the watchful eye. It can be triggered by webhooks from monitoring systems, scheduled to check metrics, or invoked when anomalies are detected. When it identifies an incident that needs human attention, it uses the slack/conversation/start ability to initiate a DM with the appropriate on-call engineer. Crucially, it passes rich context about the incident—what's happening, what's affected, relevant metrics, and suggested first steps.

The Response Agent is the helpful partner. It's connected directly to the Slack integration and handles all replies from the engineer. When someone responds to an incident DM, this agent takes over. It has access to the context provided by the Monitor (stored as an activity in the conversation) and can help the engineer investigate, provide additional information, run diagnostics, or coordinate with other systems.

Traditional alerting is one-way: a system fires an alert, and humans must figure out what to do. This architecture creates two-way conversations that start at the moment of detection.

  1. It ensures the right person knows immediately. The Monitor can look up on-call schedules, understand incident severity, and route to the appropriate engineer—no alert fatigue from broadcast channels.
  2. It provides context upfront. Instead of an engineer getting a cryptic alert and then spending 10 minutes gathering context, the DM arrives with everything they need to start investigating immediately.
  3. It offers ongoing assistance. The Response Agent can answer questions, run additional checks, fetch logs, or even execute runbook steps—all within the same Slack thread where the incident was reported.

This blueprint showcases the slack/conversation/start ability, which enables agents to initiate conversations rather than just respond to them. The pattern works like this: The Monitor Agent detects an incident, determines the right person to notify, crafts a contextual message, and uses slack/conversation/start with the engineer's channel ID. The DM arrives in Slack, starting a new conversation. When the engineer replies, the Response Agent takes over and the conversation continues until resolution. The context parameter transfers knowledge from the Monitor to the Response Agent, ensuring continuity across the handoff.

Use cases for this pattern include:

  • Incident Response: Detect outages, performance issues, or errors and immediately reach out to on-call engineers with full context.
  • Anomaly Investigation: When metrics deviate from normal, proactively engage the relevant team member to investigate before it becomes critical.
  • Deployment Monitoring: After deployments, watch for issues and notify the deploying engineer if problems arise—they have the most context.
  • Security Alerts: When suspicious activity is detected, immediately engage the security team with relevant details and investigation tools.
  • SLA Management: When approaching SLA thresholds, proactively alert the account owner and offer assistance in resolving the issue.

To extend this blueprint, add abilities to the Response Agent for running diagnostics, fetching logs, checking related services, or executing runbook steps. Integrate with your incident management system to automatically create tickets and track resolution. Connect to your on-call scheduler to always route to the current on-call engineer. You can also add escalation logic—if the first engineer doesn't respond within a timeout, the Monitor can initiate a new conversation with their backup.

Backstory

Common information about the bot's experience, skills and personality. For more information, see the Backstory documentation.

You are the Incident Monitor, responsible for detecting problems and ensuring the right people know about them immediately. When you identify an incident, you proactively reach out to engineers via Slack DM. YOUR ROLE: 1. ANALYZE INCIDENTS - When given alert data, metrics, or error reports, assess the severity and impact - Determine if human intervention is needed - Identify the most relevant person to notify 2. CRAFT INCIDENT NOTIFICATIONS - Write clear, actionable messages that respect the engineer's time - Lead with impact: what's broken and who's affected - Include key metrics and error details - Suggest first investigation steps - Keep it scannable—engineers are often on mobile 3. PROVIDE RICH CONTEXT - When initiating a Slack conversation, always include comprehensive context - The context helps the Response Agent assist the engineer - Include: incident type, affected services, timeline, relevant metrics, suggested runbook steps 4. INITIATE STRATEGICALLY - Use the slack/conversation/start ability to send DMs - Always provide: channel ID, clear message, detailed context - The channel should be the engineer's DM channel ID EXAMPLE MESSAGE FORMAT: 🚨 *Incident Detected: API Latency Spike* *Impact:* Payment API p99 latency at 2.3s (normal: 200ms) *Affected:* ~15% of checkout requests timing out *Started:* 3 minutes ago *Trend:* Still increasing *Suggested first steps:* 1. Check recent deployments 2. Review database connection pool 3. Check downstream payment provider status I'm here to help investigate. What would you like to check first? EXAMPLE CONTEXT FORMAT: "INCIDENT_TYPE: performance_degradation SERVICE: payment-api SEVERITY: high STARTED: 2026-01-27T14:32:00Z METRICS: p99_latency=2300ms, error_rate=15%, affected_users=~500 RECENT_CHANGES: deployment v2.3.4 at 14:28 SUGGESTED_RUNBOOK: Check DB connections, verify payment provider ON_CALL_ENGINEER: Sarah Chen ESCALATION_CONTACT: Platform Team Lead" Remember: You detect and notify. The Response Agent handles the ongoing conversation. Your job is to get the right information to the right person as quickly as possible. The current date is ${EARTH_DATE}.

Skillset

This example uses a dedicated Skillset. Skillsets are collections of abilities that can be used to create a bot with a specific set of functions and features it can perform.

  • Start Slack DM

    Initiates a new Slack DM conversation with an engineer
  • Search for Solutions

    Search the web for known issues and solutions
  • 👴

    Fetch Documentation

    Fetch runbook or documentation content
  • Search Solutions

    Search for known issues and solutions
  • 👴

    Fetch Documentation

    Fetch runbook or documentation content
  • 😣

    Run Command

    Execute shell commands for diagnostics and investigation
  • 😣

    Read/Write Files

    Read or write files in the diagnostics workspace

Terraform Code

This blueprint can be deployed using Terraform, enabling infrastructure-as-code management of your ChatBotKit resources. Use the code below to recreate this example in your own environment.

Copy this Terraform configuration to deploy the blueprint resources:

Next steps:

  1. Save the code above to a file named main.tf
  2. Set your API key: export CHATBOTKIT_API_KEY=your-api-key
  3. Run terraform init to initialize
  4. Run terraform plan to preview changes
  5. Run terraform apply to deploy

Learn more about the Terraform provider

A dedicated team of experts is available to help you create your perfect chatbot. Reach out via or chat for more information.