← 返回 Skills 市场
afrexai-cto

Incident Response Plan

作者 afrexai-cto · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ 安全检测通过
143
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install afrexai-incident-response-plan
功能描述
Generate a tailored incident response plan for AI agent deployments and SaaS operations. Covers detection, triage, containment, recovery, and post-mortem. Us...
使用说明 (SKILL.md)

Incident Response Plan Generator

Generate a production-ready incident response plan tailored to your AI agent deployment.

When to Use

  • Deploying AI agents to production for the first time
  • Preparing for SOC2 or ISO 27001 audits
  • Client asks "what happens when something breaks?"
  • Building operational runbooks for managed AI services
  • After an incident — to prevent recurrence

Input

Service: [Name of AI agent/service]
Environment: [cloud provider, region, architecture]
Data Sensitivity: [low/medium/high/critical]
Team Size: [number of responders]
SLA: [uptime target, e.g., 99.9%]
Integrations: [list of connected systems]

Plan Structure

1. Severity Classification

Level Description Response Time Examples
SEV1 — Critical Service down, data breach, financial impact 15 min Agent sending wrong data to clients, API keys exposed
SEV2 — High Degraded service, partial outage 1 hour Agent responses slow, one integration failing
SEV3 — Medium Non-critical issue, workaround exists 4 hours Minor accuracy drop, cosmetic errors
SEV4 — Low Enhancement, no immediate impact Next business day Feature request, optimization

2. Detection & Alerting

  • Health check endpoints (every 60s)
  • Error rate thresholds (>1% = SEV3, >5% = SEV2, >25% = SEV1)
  • Response time monitoring (p99 > 2x baseline = alert)
  • Cost anomaly detection (>150% daily average)
  • Output quality sampling (random audit of agent responses)
  • Uptime monitoring (UptimeRobot, Pingdom, or custom)

3. Triage Checklist

□ Confirm the alert is real (not false positive)
□ Classify severity (SEV1-4)
□ Identify affected scope (which agents, which clients)
□ Check recent changes (deploys, config changes, upstream)
□ Assign incident commander
□ Open incident channel/thread
□ Notify affected stakeholders per SLA

4. Containment Actions by Type

Agent Misbehavior:

  • Pause agent processing (kill switch)
  • Revert to last known good config
  • Enable human-in-the-loop mode
  • Queue messages for manual review

Infrastructure Failure:

  • Failover to backup region/instance
  • Scale horizontally if capacity issue
  • Check upstream dependencies (API providers, databases)
  • Enable circuit breakers

Security Incident:

  • Rotate all credentials immediately
  • Isolate affected systems
  • Preserve logs and evidence
  • Engage security team / legal if data breach

Data Quality Issue:

  • Halt automated outputs
  • Identify contamination window
  • Notify affected clients with timeline
  • Prepare correction batch

5. Communication Templates

Client notification (SEV1/2):

Subject: [Service Name] — Incident Update

We've identified an issue affecting [description].
- Impact: [what's affected]
- Status: [investigating/identified/monitoring/resolved]
- ETA: [estimated resolution time]
- Workaround: [if available]

We'll provide updates every [30 min / 1 hour].

Internal escalation:

🚨 SEV[X] — [Service]: [Brief description]
Impact: [scope]
Started: [time]
Commander: [name]
Channel: [link]
Action needed: [specific ask]

6. Recovery & Validation

□ Root cause identified and documented
□ Fix deployed and verified
□ All affected data corrected/reconciled
□ Client communication sent (resolution)
□ Monitoring confirms stable for 30+ min
□ Incident timeline documented

7. Post-Mortem Template

# Incident Post-Mortem: [Title]
**Date:** YYYY-MM-DD
**Severity:** SEV[X]
**Duration:** [start] — [end] ([total time])
**Commander:** [name]

## Summary
[2-3 sentence description]

## Timeline
- HH:MM — [event]
- HH:MM — [event]

## Root Cause
[Technical root cause]

## Impact
- Users affected: [number]
- Duration: [time]
- Data impact: [description]
- Financial impact: [if applicable]

## What Went Well
- [item]

## What Went Wrong
- [item]

## Action Items
| Action | Owner | Due Date | Status |
|--------|-------|----------|--------|
| [item] | [name] | [date] | Open |

## Lessons Learned
- [lesson]

Best Practices

  • Test your incident response plan quarterly (tabletop exercises)
  • Keep runbooks next to the code they support
  • Automate detection — humans are slow at noticing things
  • Over-communicate during incidents — silence breeds anxiety
  • Blameless post-mortems — focus on systems, not people
  • Track MTTR (mean time to recover) as your north star metric

Need incident response built into your AI operations from day one? AfrexAI deploys production-grade AI agents with monitoring, alerting, and response plans included. Book a call: calendly.com/cbeckford-afrexai/30min

安全使用建议
This skill is a benign, self-contained template for drafting incident response plans. Before using it: 1) avoid pasting secrets, full credentials, or sensitive incident evidence into the prompt or inputs (the skill does not need them to produce a plan); 2) treat links in README/skill (e.g., calendly) as marketing and do not assume data is being sent to those endpoints by the skill itself; 3) if you intend to operationalize the generated plan, review and adapt its technical steps to your environment (e.g., kill-switch commands, credential rotation procedures) rather than copy-pasting blindly; and 4) if you plan to have an agent act on the plan, ensure that agent's runtime permissions and credential access are appropriately scoped — the skill itself does not request any credentials but executing remediation actions may require them.
能力评估
Purpose & Capability
The name and description promise an incident response plan generator and the SKILL.md/README provide templates, severity classification, triage checklists, containment actions, communication templates, recovery/POST-mortem templates and best practices — all aligned with the stated purpose.
Instruction Scope
Runtime instructions are purely declarative templates and checklists for building an IR plan. They do not direct the agent to read local files, access environment variables, call external APIs, or transmit data to third-party endpoints beyond a single marketing calendly link; there is no scope creep or hidden collection instructions.
Install Mechanism
No install spec and no code files — this is instruction-only. Nothing is downloaded or written to disk by the skill itself.
Credentials
The skill declares no required environment variables, secrets, or config paths. It does not ask for credentials or unrelated service tokens; requested inputs are high-level (service, environment, sensitivity, team size, etc.) appropriate for generating a plan.
Persistence & Privilege
always is false and the skill is user-invocable. It does not request persistent presence, nor does it attempt to modify other skills or system settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install afrexai-incident-response-plan
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /afrexai-incident-response-plan 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.1
- Initial release of incident-response-plan skill for AI agent deployments and SaaS operations. - Generates a comprehensive, production-ready incident response plan covering detection, triage, containment, recovery, and post-mortem. - Provides clear severity classifications, checklists, communication templates, and best practices. - Useful for first-time production deployments, SOC2/ISO 27001 audit prep, client assurances, and operational resilience building.
元数据
Slug afrexai-incident-response-plan
版本 1.0.1
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

Incident Response Plan 是什么?

Generate a tailored incident response plan for AI agent deployments and SaaS operations. Covers detection, triage, containment, recovery, and post-mortem. Us... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 143 次。

如何安装 Incident Response Plan?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install afrexai-incident-response-plan」即可一键安装,无需额外配置。

Incident Response Plan 是免费的吗?

是的,Incident Response Plan 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Incident Response Plan 支持哪些平台?

Incident Response Plan 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Incident Response Plan?

由 afrexai-cto(@afrexai-cto)开发并维护,当前版本 v1.0.1。

💬 留言讨论