← 返回 Skills 市场
ivangdavila

Alerts

作者 Iván · GitHub ↗ · v1.0.0
cross-platform ✓ 安全检测通过
1163
总下载
2
收藏
4
当前安装
1
版本数
在 OpenClaw 中安装
/install alerts
功能描述
Smart alerting patterns for AI agents - deduplication, routing, escalation, and fatigue prevention
使用说明 (SKILL.md)

Alert Fatigue Prevention

Group alerts by root cause, never by individual symptoms. Use labels: alertname, service, cluster - not instance IDs.

# Good: One alert for database down affecting 50 pods
group_by: ['alertname', 'service']
# Bad: 50 individual alerts for each failed pod

Implement severity hierarchy: P0 (pages immediately) > P1 (within 15min) > P2 (business hours) > P3 (weekly review). P0: Service completely down, data loss, security breach. P1: Degraded performance, partial outage, high error rates.

Set cooldown periods to prevent alert spam. Minimum 5 minutes between identical alerts, 30 minutes for cost alerts.

repeat_interval: 5m  # For critical alerts
repeat_interval: 30m # For cost/performance alerts

Use inhibition rules to suppress symptoms when root cause fires. If "Database Unreachable" fires, silence all "API High Latency" alerts from same cluster.

AI Agent Monitoring Patterns

Monitor token/API usage with exponential alerting thresholds. Alert at 2x, 5x, 10x normal usage - costs can spiral quickly. Track: tokens per minute, cost per request, API rate limits approached.

Set behavioral drift alerts on response quality degradation. Compare current outputs to baseline with sample prompts every hour. Alert when success rate drops below 85% or response time exceeds 2x baseline.

Monitor for infinite loops in multi-agent workflows. Alert if same prompt sent >3 times in 5 minutes or agent hasn't responded in 10 minutes. Include correlation IDs to trace conversation chains.

Track silent failures through downstream metrics. Monitor: tasks completed vs started, user satisfaction scores, retry attempts. These catch errors that don't throw exceptions.

Routing and Escalation Rules

Route by expertise domain, not arbitrary on-call schedules. Database alerts → DB team, API alerts → backend team, cost alerts → platform team. Only escalate to managers for P0 incidents lasting >30 minutes.

Use progressive escalation with increasing urgency. P1 alerts: Slack notification → 5min wait → SMS → 10min wait → phone call. Include runbook links in every alert for faster resolution.

Set context-aware routing based on time and impact. Business hours: Route to primary team. Off-hours: Route to on-call only for P0/P1. If >100 users affected: Immediately escalate regardless of severity.

Webhook Reliability Patterns

Always include correlation IDs for alert lifecycle management. Generate UUID for each incident, use it to create/update/resolve alerts. Essential for bi-directional integrations with PagerDuty/Slack.

Implement exponential backoff for webhook failures. Retry after 1s, 2s, 4s, 8s, 16s, then mark failed and escalate. Log webhook response codes/times for debugging delivery issues.

Use webhook verification to prevent spoofing. Validate signatures using HMAC-SHA256 with shared secret. Always check timestamp to prevent replay attacks (max 5 min old).

Implement circuit breaker pattern for unreliable endpoints. After 5 consecutive failures, mark endpoint down and use backup channel. Re-test every 30 seconds until recovery confirmed.

Status Page Integration

Update status page automatically when P0/P1 alerts fire. Create incident, post initial assessment within 5 minutes. Include ETA and workaround if available.

Use component-based status updates matching your alert groups. Map alert labels to status page components (API, Database, Auth, etc.). Partial outages should show "Degraded Performance", not "Operational".

Runbook Automation

Embed runbook links directly in alert messages. Format: "Alert: High CPU on web-01. Runbook: https://wiki/runbooks/high-cpu-web" Links must be accessible from mobile devices for on-call engineers.

Trigger automated remediation for known issues. Auto-restart stuck services, clear full disks, reset rate limits. Always require human approval for destructive actions (scaling down, deleting data).

Log all automated actions taken in response to alerts. Include: timestamp, action, result, approval chain. Essential for post-incident reviews and compliance audits.

安全使用建议
This skill is a set of best-practice instructions for alerting and incident workflows and appears internally consistent and non-invasive. Before adopting: (1) treat the SKILL.md as guidance — you still need to implement webhook verification, secret storage, and escalation logic in your own systems; (2) ensure any shared secrets (HMAC keys) are stored securely (not pasted into public configs); (3) implement and test human-approval gates for destructive automated remediation; and (4) validate runbook links and access controls so sensitive operational details aren't exposed to unauthorized users.
功能分析
Type: OpenClaw Skill Name: alerts Version: 1.0.0 The skill bundle contains only metadata and a markdown document (`SKILL.md`) providing best practices and patterns for designing and implementing an alerting system, particularly for AI agents. The content is purely descriptive and educational, outlining concepts like alert fatigue prevention, AI agent monitoring, routing, webhooks, and runbook automation. There are no executable commands, instructions for data exfiltration, prompt injection attempts, or any other indicators of malicious or suspicious behavior. The 'code blocks' in `SKILL.md` are illustrative configuration examples, not commands for the agent to execute.
能力评估
Purpose & Capability
The name/description (alerting patterns, deduplication, routing, escalation) matches the SKILL.md content. Nothing requested (no env vars, binaries, or config paths) is inconsistent with an advisory/reference skill.
Instruction Scope
SKILL.md provides operational guidance and config examples for alerting, routing, webhook reliability, runbooks and automation. It does not instruct the agent to read local files, access unrelated credentials, or transmit data to external endpoints beyond generic integration references (PagerDuty/Slack). It includes sensible warnings (human approval for destructive actions).
Install Mechanism
There is no install spec and no code files — this is instruction-only, so nothing is written to disk or downloaded during install.
Credentials
The skill declares no required environment variables, credentials, or config paths. The SKILL.md mentions HMAC-SHA256 webhook verification and shared secrets as design guidance, but does not request or require secrets itself.
Persistence & Privilege
always is false and model invocation is allowed (platform default). The skill does not request permanent presence, nor does it instruct modification of other skills or system-wide settings.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install alerts
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /alerts 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release
元数据
Slug alerts
版本 1.0.0
许可证
累计安装 4
当前安装数 4
历史版本数 1
常见问题

Alerts 是什么?

Smart alerting patterns for AI agents - deduplication, routing, escalation, and fatigue prevention. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 1163 次。

如何安装 Alerts?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install alerts」即可一键安装,无需额外配置。

Alerts 是免费的吗?

是的,Alerts 完全免费(开源免费),可自由下载、安装和使用。

Alerts 支持哪些平台?

Alerts 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Alerts?

由 Iván(@ivangdavila)开发并维护,当前版本 v1.0.0。

💬 留言讨论