← 返回 Skills 市场
1kalin

AI Agent Observability

作者 1kalin · GitHub ↗ · v1.1.0
cross-platform ⚠ suspicious
589
总下载
0
收藏
0
当前安装
2
版本数
在 OpenClaw 中安装
/install afrexai-agent-observability
功能描述
Evaluate and monitor AI agent fleets across six key dimensions to score health, identify issues, and optimize performance for ops teams managing 1-100+ agents.
使用说明 (SKILL.md)

Agent Observability & Monitoring

Score, monitor, and troubleshoot AI agent fleets in production. Built for ops teams running 1-100+ agents.

What This Does

Evaluates your agent deployment across 6 dimensions and returns a 0-100 health score with specific fixes.

6-Dimension Assessment

1. Execution Visibility (0-20 pts)

  • Can you see what every agent is doing right now?
  • Task queue depth, active/idle ratio, error rates
  • Benchmark: Top quartile tracks 95%+ of agent actions in real-time

2. Cost Attribution (0-20 pts)

  • Do you know exactly what each agent costs per task?
  • Token spend, API calls, compute time, tool invocations
  • Benchmark: Unmonitored agents waste 30-55% on retries and hallucination loops

3. Output Quality (0-15 pts)

  • Are agent outputs validated before reaching users or systems?
  • Accuracy sampling, hallucination detection, regression tracking
  • Benchmark: 1 in 12 agent outputs contains a material error without monitoring

4. Failure Recovery (0-15 pts)

  • What happens when an agent fails mid-task?
  • Retry logic, graceful degradation, human escalation paths
  • Benchmark: Mean time to detect agent failure without monitoring: 4.2 hours

5. Security & Boundaries (0-15 pts)

  • Are agents staying within authorized scope?
  • Tool access auditing, data exfiltration checks, permission drift
  • Benchmark: 23% of production agents access tools outside their intended scope

6. Fleet Coordination (0-15 pts)

  • Do multi-agent workflows hand off cleanly?
  • Message passing reliability, deadlock detection, duplicate work
  • Benchmark: Uncoordinated fleets duplicate 18-25% of work

Scoring

Score Rating Action
80-100 Production-grade Optimize and scale
60-79 Operational Fix gaps before scaling
40-59 Risky Immediate remediation needed
0-39 Blind Stop scaling, instrument first

Quick Assessment Prompt

Ask the agent to evaluate your setup:

Run the agent observability assessment against our current deployment:
- How many agents are running?
- What monitoring exists today?
- What broke in the last 30 days?
- What's our monthly agent spend?
- Who gets alerted when an agent fails?

Cost Framework

Company Size Unmonitored Waste Monitoring Investment Net Savings
1-5 agents $2K-$8K/mo $500-$1K/mo $1.5K-$7K/mo
5-20 agents $8K-$45K/mo $2K-$5K/mo $6K-$40K/mo
20-100 agents $45K-$200K/mo $8K-$20K/mo $37K-$180K/mo

90-Day Monitoring Roadmap

Week 1-2: Inventory all agents, document intended scope, tag cost centers Week 3-4: Deploy execution logging (every tool call, every output) Month 2: Build dashboards — cost per task, error rate, latency P95 Month 3: Automated alerting — failure detection \x3C5 min, cost anomaly flags, scope violations

7 Monitoring Mistakes

  1. Logging only errors (miss the slow degradation)
  2. No cost attribution (agents burn budget invisibly)
  3. Monitoring agents like servers (they need task-level observability)
  4. Manual review of agent outputs (doesn't scale past 3 agents)
  5. No baseline metrics (can't detect regression without a baseline)
  6. Alerting on everything (alert fatigue kills response time)
  7. Skipping agent-to-agent handoff monitoring (where most fleet failures happen)

Industry Adjustments

Industry Critical Dimension Why
Financial Services Security & Boundaries Regulatory audit trails mandatory
Healthcare Output Quality Clinical accuracy non-negotiable
Legal Execution Visibility Billing requires task-level tracking
Ecommerce Cost Attribution Margin-sensitive, waste kills profit
SaaS Fleet Coordination Multi-tenant agent isolation
Manufacturing Failure Recovery Downtime = production line stops
Construction Security & Boundaries Safety-critical document handling
Real Estate Output Quality Valuation errors = liability
Recruitment Fleet Coordination Candidate pipeline handoffs
Professional Services Cost Attribution Client billing accuracy

Go Deeper

Built by AfrexAI — we help businesses run AI agents that actually make money.

安全使用建议
This skill is essentially a checklist and a prompt template rather than a connector or monitoring tool. Before installing or invoking it, consider: 1) It will not automatically collect metrics — if you ask an autonomous agent to 'run the assessment' it may try to gather data from your environment; restrict that agent's permissions (least privilege) and network access. 2) Prefer using it as a human-facing guide or with a sandboxed agent that only has read-only, narrowly scoped access to specific monitoring/billing APIs you explicitly provision. 3) If you intend automated collection, define exactly which APIs/endpoints the agent may call and supply dedicated read-only credentials; update the SKILL.md to list required env vars and safe query patterns. 4) If you have sensitive billing or production telemetry, require human approval for any actions that access those systems. These steps will reduce the risk that an ambiguous prompt leads to unintended access or data exposure.
功能分析
Type: OpenClaw Skill Name: afrexai-agent-observability Version: 1.1.0 The skill bundle describes a legitimate purpose: 'Agent Observability & Monitoring'. The `SKILL.md` and `README.md` files provide documentation and instructions for the user to interact with the agent, all aligned with the stated purpose. There are no instructions for the AI agent to perform malicious actions, exfiltrate data, establish persistence, execute arbitrary code, or engage in prompt injection against itself. External links in `SKILL.md` and `README.md` point to informational GitHub Pages resources, but the skill does not instruct the agent to fetch or execute content from these links.
能力评估
Purpose & Capability
The name and description claim fleet observability, and the content is a plausible checklist and prompt for that purpose. However, the skill is purely instruction-only (no code, no declared integrations, no env vars), so it cannot actually perform automated monitoring by itself — it can only guide an agent or human to perform the assessment. The claim to 'Run the agent observability assessment against our current deployment' is disproportionate to what's provided (no connectors, no credentials, no CLI/API guidance).
Instruction Scope
SKILL.md contains open-ended runtime prompts that ask the agent to evaluate the 'current deployment' and to gather counts, spend, alerts, and recent failures. The instructions do not confine how the agent should obtain that information (no explicit APIs/paths to query, no declared env vars). That vagueness grants broad discretion — an autonomous agent could attempt to read environment variables, query cloud APIs, inspect files, or contact external endpoints to satisfy the prompt, which is scope creep relative to the simple documentation/assessment intent.
Install Mechanism
No install spec and no code files — lowest-risk install footprint. Nothing will be written to disk by the skill itself because there is no install step.
Credentials
The skill declares no required environment variables, credentials, or config paths, which is consistent with an instruction-only checklist. However, the assessment questions (monthly spend, monitoring existence, alerting targets) typically require privileged access to billing, monitoring, or deployment APIs; those are not declared, creating an implicit gap. If an agent pursues answers programmatically, it may request credentials later — the skill does not justify or constrain that need.
Persistence & Privilege
always is false and there are no persistence or system-modifying instructions. Autonomous invocation is allowed (platform default) but the skill does not request elevated or persistent privileges itself.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install afrexai-agent-observability
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /afrexai-agent-observability 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.0
No functional or documentation changes were detected in this version. - Version bump to 1.1.0 with no file changes. - All features and documentation remain unchanged.
v1.0.0
Initial release: score, monitor, and troubleshoot AI agent fleets across 6 critical operational dimensions. - Introduces a 0-100 scoring system for agent deployment health, with actionable recommendations. - Provides a 6-dimension assessment: Execution Visibility, Cost Attribution, Output Quality, Failure Recovery, Security & Boundaries, Fleet Coordination. - Offers a cost framework, 90-day monitoring roadmap, and lists common monitoring mistakes. - Includes industry-specific best practices and benchmarks. - Supplies quick assessment prompt and links to further resources.
元数据
Slug afrexai-agent-observability
版本 1.1.0
许可证
累计安装 0
当前安装数 0
历史版本数 2
常见问题

AI Agent Observability 是什么?

Evaluate and monitor AI agent fleets across six key dimensions to score health, identify issues, and optimize performance for ops teams managing 1-100+ agents. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 589 次。

如何安装 AI Agent Observability?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install afrexai-agent-observability」即可一键安装,无需额外配置。

AI Agent Observability 是免费的吗?

是的,AI Agent Observability 完全免费(开源免费),可自由下载、安装和使用。

AI Agent Observability 支持哪些平台?

AI Agent Observability 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 AI Agent Observability?

由 1kalin(@1kalin)开发并维护,当前版本 v1.1.0。

💬 留言讨论