← 返回 Skills 市场

AI Agent Observability

Name: AI Agent Observability
Author: 1kalin

作者 1kalin · GitHub ↗ · v1.1.0

cross-platform ⚠ suspicious

589

总下载

当前安装

版本数

在 OpenClaw 中安装

/install afrexai-agent-observability

功能描述

Evaluate and monitor AI agent fleets across six key dimensions to score health, identify issues, and optimize performance for ops teams managing 1-100+ agents.

使用说明 (SKILL.md)

Agent Observability & Monitoring

Score, monitor, and troubleshoot AI agent fleets in production. Built for ops teams running 1-100+ agents.

What This Does

Evaluates your agent deployment across 6 dimensions and returns a 0-100 health score with specific fixes.

6-Dimension Assessment

1. Execution Visibility (0-20 pts)

Can you see what every agent is doing right now?
Task queue depth, active/idle ratio, error rates
Benchmark: Top quartile tracks 95%+ of agent actions in real-time

2. Cost Attribution (0-20 pts)

Do you know exactly what each agent costs per task?
Token spend, API calls, compute time, tool invocations
Benchmark: Unmonitored agents waste 30-55% on retries and hallucination loops

3. Output Quality (0-15 pts)

Are agent outputs validated before reaching users or systems?
Accuracy sampling, hallucination detection, regression tracking
Benchmark: 1 in 12 agent outputs contains a material error without monitoring

4. Failure Recovery (0-15 pts)

What happens when an agent fails mid-task?
Retry logic, graceful degradation, human escalation paths
Benchmark: Mean time to detect agent failure without monitoring: 4.2 hours

5. Security & Boundaries (0-15 pts)

Are agents staying within authorized scope?
Tool access auditing, data exfiltration checks, permission drift
Benchmark: 23% of production agents access tools outside their intended scope

6. Fleet Coordination (0-15 pts)

Do multi-agent workflows hand off cleanly?
Message passing reliability, deadlock detection, duplicate work
Benchmark: Uncoordinated fleets duplicate 18-25% of work

Scoring

Score	Rating	Action
80-100	Production-grade	Optimize and scale
60-79	Operational	Fix gaps before scaling
40-59	Risky	Immediate remediation needed
0-39	Blind	Stop scaling, instrument first

Quick Assessment Prompt

Ask the agent to evaluate your setup:

Run the agent observability assessment against our current deployment:
- How many agents are running?
- What monitoring exists today?
- What broke in the last 30 days?
- What's our monthly agent spend?
- Who gets alerted when an agent fails?

Cost Framework

Company Size	Unmonitored Waste	Monitoring Investment	Net Savings
1-5 agents	$2K-$8K/mo	$500-$1K/mo	$1.5K-$7K/mo
5-20 agents	$8K-$45K/mo	$2K-$5K/mo	$6K-$40K/mo
20-100 agents	$45K-$200K/mo	$8K-$20K/mo	$37K-$180K/mo

90-Day Monitoring Roadmap

Week 1-2: Inventory all agents, document intended scope, tag cost centers Week 3-4: Deploy execution logging (every tool call, every output) Month 2: Build dashboards — cost per task, error rate, latency P95 Month 3: Automated alerting — failure detection \x3C5 min, cost anomaly flags, scope violations

7 Monitoring Mistakes

Logging only errors (miss the slow degradation)
No cost attribution (agents burn budget invisibly)
Monitoring agents like servers (they need task-level observability)
Manual review of agent outputs (doesn't scale past 3 agents)
No baseline metrics (can't detect regression without a baseline)
Alerting on everything (alert fatigue kills response time)
Skipping agent-to-agent handoff monitoring (where most fleet failures happen)

Industry Adjustments

Industry	Critical Dimension	Why
Financial Services	Security & Boundaries	Regulatory audit trails mandatory
Healthcare	Output Quality	Clinical accuracy non-negotiable
Legal	Execution Visibility	Billing requires task-level tracking
Ecommerce	Cost Attribution	Margin-sensitive, waste kills profit
SaaS	Fleet Coordination	Multi-tenant agent isolation
Manufacturing	Failure Recovery	Downtime = production line stops
Construction	Security & Boundaries	Safety-critical document handling
Real Estate	Output Quality	Valuation errors = liability
Recruitment	Fleet Coordination	Candidate pipeline handoffs
Professional Services	Cost Attribution	Client billing accuracy

Go Deeper

AI Agent Context Packs — industry-specific decision frameworks: https://afrexai-cto.github.io/context-packs/
AI Revenue Leak Calculator — find where your business loses money to manual processes: https://afrexai-cto.github.io/ai-revenue-calculator/
Agent Setup Wizard — configure your agent stack in 5 minutes: https://afrexai-cto.github.io/agent-setup/

Built by AfrexAI — we help businesses run AI agents that actually make money.

安全使用建议

This skill is essentially a checklist and a prompt template rather than a connector or monitoring tool. Before installing or invoking it, consider: 1) It will not automatically collect metrics — if you ask an autonomous agent to 'run the assessment' it may try to gather data from your environment; restrict that agent's permissions (least privilege) and network access. 2) Prefer using it as a human-facing guide or with a sandboxed agent that only has read-only, narrowly scoped access to specific monitoring/billing APIs you explicitly provision. 3) If you intend automated collection, define exactly which APIs/endpoints the agent may call and supply dedicated read-only credentials; update the SKILL.md to list required env vars and safe query patterns. 4) If you have sensitive billing or production telemetry, require human approval for any actions that access those systems. These steps will reduce the risk that an ambiguous prompt leads to unintended access or data exposure.

功能分析

Type: OpenClaw Skill Name: afrexai-agent-observability Version: 1.1.0 The skill bundle describes a legitimate purpose: 'Agent Observability & Monitoring'. The `SKILL.md` and `README.md` files provide documentation and instructions for the user to interact with the agent, all aligned with the stated purpose. There are no instructions for the AI agent to perform malicious actions, exfiltrate data, establish persistence, execute arbitrary code, or engage in prompt injection against itself. External links in `SKILL.md` and `README.md` point to informational GitHub Pages resources, but the skill does not instruct the agent to fetch or execute content from these links.

能力评估

ℹ Purpose & Capability

The name and description claim fleet observability, and the content is a plausible checklist and prompt for that purpose. However, the skill is purely instruction-only (no code, no declared integrations, no env vars), so it cannot actually perform automated monitoring by itself — it can only guide an agent or human to perform the assessment. The claim to 'Run the agent observability assessment against our current deployment' is disproportionate to what's provided (no connectors, no credentials, no CLI/API guidance).

⚠ Instruction Scope

SKILL.md contains open-ended runtime prompts that ask the agent to evaluate the 'current deployment' and to gather counts, spend, alerts, and recent failures. The instructions do not confine how the agent should obtain that information (no explicit APIs/paths to query, no declared env vars). That vagueness grants broad discretion — an autonomous agent could attempt to read environment variables, query cloud APIs, inspect files, or contact external endpoints to satisfy the prompt, which is scope creep relative to the simple documentation/assessment intent.

✓ Install Mechanism

No install spec and no code files — lowest-risk install footprint. Nothing will be written to disk by the skill itself because there is no install step.

ℹ Credentials

The skill declares no required environment variables, credentials, or config paths, which is consistent with an instruction-only checklist. However, the assessment questions (monthly spend, monitoring existence, alerting targets) typically require privileged access to billing, monitoring, or deployment APIs; those are not declared, creating an implicit gap. If an agent pursues answers programmatically, it may request credentials later — the skill does not justify or constrain that need.

✓ Persistence & Privilege

always is false and there are no persistence or system-modifying instructions. Autonomous invocation is allowed (platform default) but the skill does not request elevated or persistent privileges itself.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install afrexai-agent-observability
安装完成后，直接呼叫该 Skill 的名称或使用 /afrexai-agent-observability 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.1.0

No functional or documentation changes were detected in this version. - Version bump to 1.1.0 with no file changes. - All features and documentation remain unchanged.

v1.0.0

Initial release: score, monitor, and troubleshoot AI agent fleets across 6 critical operational dimensions. - Introduces a 0-100 scoring system for agent deployment health, with actionable recommendations. - Provides a 6-dimension assessment: Execution Visibility, Cost Attribution, Output Quality, Failure Recovery, Security & Boundaries, Fleet Coordination. - Offers a cost framework, 90-day monitoring roadmap, and lists common monitoring mistakes. - Includes industry-specific best practices and benchmarks. - Supplies quick assessment prompt and links to further resources.

元数据

Slug afrexai-agent-observability

版本 1.1.0

许可证 —

累计安装 0

当前安装数 0

历史版本数 2

常见问题