← Back to Skills Marketplace
1kalin

AI Agent Observability

by 1kalin · GitHub ↗ · v1.1.0
cross-platform ⚠ suspicious
589
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install afrexai-agent-observability
Description
Evaluate and monitor AI agent fleets across six key dimensions to score health, identify issues, and optimize performance for ops teams managing 1-100+ agents.
README (SKILL.md)

Agent Observability & Monitoring

Score, monitor, and troubleshoot AI agent fleets in production. Built for ops teams running 1-100+ agents.

What This Does

Evaluates your agent deployment across 6 dimensions and returns a 0-100 health score with specific fixes.

6-Dimension Assessment

1. Execution Visibility (0-20 pts)

  • Can you see what every agent is doing right now?
  • Task queue depth, active/idle ratio, error rates
  • Benchmark: Top quartile tracks 95%+ of agent actions in real-time

2. Cost Attribution (0-20 pts)

  • Do you know exactly what each agent costs per task?
  • Token spend, API calls, compute time, tool invocations
  • Benchmark: Unmonitored agents waste 30-55% on retries and hallucination loops

3. Output Quality (0-15 pts)

  • Are agent outputs validated before reaching users or systems?
  • Accuracy sampling, hallucination detection, regression tracking
  • Benchmark: 1 in 12 agent outputs contains a material error without monitoring

4. Failure Recovery (0-15 pts)

  • What happens when an agent fails mid-task?
  • Retry logic, graceful degradation, human escalation paths
  • Benchmark: Mean time to detect agent failure without monitoring: 4.2 hours

5. Security & Boundaries (0-15 pts)

  • Are agents staying within authorized scope?
  • Tool access auditing, data exfiltration checks, permission drift
  • Benchmark: 23% of production agents access tools outside their intended scope

6. Fleet Coordination (0-15 pts)

  • Do multi-agent workflows hand off cleanly?
  • Message passing reliability, deadlock detection, duplicate work
  • Benchmark: Uncoordinated fleets duplicate 18-25% of work

Scoring

Score Rating Action
80-100 Production-grade Optimize and scale
60-79 Operational Fix gaps before scaling
40-59 Risky Immediate remediation needed
0-39 Blind Stop scaling, instrument first

Quick Assessment Prompt

Ask the agent to evaluate your setup:

Run the agent observability assessment against our current deployment:
- How many agents are running?
- What monitoring exists today?
- What broke in the last 30 days?
- What's our monthly agent spend?
- Who gets alerted when an agent fails?

Cost Framework

Company Size Unmonitored Waste Monitoring Investment Net Savings
1-5 agents $2K-$8K/mo $500-$1K/mo $1.5K-$7K/mo
5-20 agents $8K-$45K/mo $2K-$5K/mo $6K-$40K/mo
20-100 agents $45K-$200K/mo $8K-$20K/mo $37K-$180K/mo

90-Day Monitoring Roadmap

Week 1-2: Inventory all agents, document intended scope, tag cost centers Week 3-4: Deploy execution logging (every tool call, every output) Month 2: Build dashboards — cost per task, error rate, latency P95 Month 3: Automated alerting — failure detection \x3C5 min, cost anomaly flags, scope violations

7 Monitoring Mistakes

  1. Logging only errors (miss the slow degradation)
  2. No cost attribution (agents burn budget invisibly)
  3. Monitoring agents like servers (they need task-level observability)
  4. Manual review of agent outputs (doesn't scale past 3 agents)
  5. No baseline metrics (can't detect regression without a baseline)
  6. Alerting on everything (alert fatigue kills response time)
  7. Skipping agent-to-agent handoff monitoring (where most fleet failures happen)

Industry Adjustments

Industry Critical Dimension Why
Financial Services Security & Boundaries Regulatory audit trails mandatory
Healthcare Output Quality Clinical accuracy non-negotiable
Legal Execution Visibility Billing requires task-level tracking
Ecommerce Cost Attribution Margin-sensitive, waste kills profit
SaaS Fleet Coordination Multi-tenant agent isolation
Manufacturing Failure Recovery Downtime = production line stops
Construction Security & Boundaries Safety-critical document handling
Real Estate Output Quality Valuation errors = liability
Recruitment Fleet Coordination Candidate pipeline handoffs
Professional Services Cost Attribution Client billing accuracy

Go Deeper

Built by AfrexAI — we help businesses run AI agents that actually make money.

Usage Guidance
This skill is essentially a checklist and a prompt template rather than a connector or monitoring tool. Before installing or invoking it, consider: 1) It will not automatically collect metrics — if you ask an autonomous agent to 'run the assessment' it may try to gather data from your environment; restrict that agent's permissions (least privilege) and network access. 2) Prefer using it as a human-facing guide or with a sandboxed agent that only has read-only, narrowly scoped access to specific monitoring/billing APIs you explicitly provision. 3) If you intend automated collection, define exactly which APIs/endpoints the agent may call and supply dedicated read-only credentials; update the SKILL.md to list required env vars and safe query patterns. 4) If you have sensitive billing or production telemetry, require human approval for any actions that access those systems. These steps will reduce the risk that an ambiguous prompt leads to unintended access or data exposure.
Capability Analysis
Type: OpenClaw Skill Name: afrexai-agent-observability Version: 1.1.0 The skill bundle describes a legitimate purpose: 'Agent Observability & Monitoring'. The `SKILL.md` and `README.md` files provide documentation and instructions for the user to interact with the agent, all aligned with the stated purpose. There are no instructions for the AI agent to perform malicious actions, exfiltrate data, establish persistence, execute arbitrary code, or engage in prompt injection against itself. External links in `SKILL.md` and `README.md` point to informational GitHub Pages resources, but the skill does not instruct the agent to fetch or execute content from these links.
Capability Assessment
Purpose & Capability
The name and description claim fleet observability, and the content is a plausible checklist and prompt for that purpose. However, the skill is purely instruction-only (no code, no declared integrations, no env vars), so it cannot actually perform automated monitoring by itself — it can only guide an agent or human to perform the assessment. The claim to 'Run the agent observability assessment against our current deployment' is disproportionate to what's provided (no connectors, no credentials, no CLI/API guidance).
Instruction Scope
SKILL.md contains open-ended runtime prompts that ask the agent to evaluate the 'current deployment' and to gather counts, spend, alerts, and recent failures. The instructions do not confine how the agent should obtain that information (no explicit APIs/paths to query, no declared env vars). That vagueness grants broad discretion — an autonomous agent could attempt to read environment variables, query cloud APIs, inspect files, or contact external endpoints to satisfy the prompt, which is scope creep relative to the simple documentation/assessment intent.
Install Mechanism
No install spec and no code files — lowest-risk install footprint. Nothing will be written to disk by the skill itself because there is no install step.
Credentials
The skill declares no required environment variables, credentials, or config paths, which is consistent with an instruction-only checklist. However, the assessment questions (monthly spend, monitoring existence, alerting targets) typically require privileged access to billing, monitoring, or deployment APIs; those are not declared, creating an implicit gap. If an agent pursues answers programmatically, it may request credentials later — the skill does not justify or constrain that need.
Persistence & Privilege
always is false and there are no persistence or system-modifying instructions. Autonomous invocation is allowed (platform default) but the skill does not request elevated or persistent privileges itself.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install afrexai-agent-observability
  3. After installation, invoke the skill by name or use /afrexai-agent-observability
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.1.0
No functional or documentation changes were detected in this version. - Version bump to 1.1.0 with no file changes. - All features and documentation remain unchanged.
v1.0.0
Initial release: score, monitor, and troubleshoot AI agent fleets across 6 critical operational dimensions. - Introduces a 0-100 scoring system for agent deployment health, with actionable recommendations. - Provides a 6-dimension assessment: Execution Visibility, Cost Attribution, Output Quality, Failure Recovery, Security & Boundaries, Fleet Coordination. - Offers a cost framework, 90-day monitoring roadmap, and lists common monitoring mistakes. - Includes industry-specific best practices and benchmarks. - Supplies quick assessment prompt and links to further resources.
Metadata
Slug afrexai-agent-observability
Version 1.1.0
License
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is AI Agent Observability?

Evaluate and monitor AI agent fleets across six key dimensions to score health, identify issues, and optimize performance for ops teams managing 1-100+ agents. It is an AI Agent Skill for Claude Code / OpenClaw, with 589 downloads so far.

How do I install AI Agent Observability?

Run "/install afrexai-agent-observability" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is AI Agent Observability free?

Yes, AI Agent Observability is completely free (open-source). You can download, install and use it at no cost.

Which platforms does AI Agent Observability support?

AI Agent Observability is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created AI Agent Observability?

It is built and maintained by 1kalin (@1kalin); the current version is v1.1.0.

💬 Comments