← 返回 Skills 市场

monitoring-specialist

Name: monitoring-specialist
Author: mtsatryan

作者 Michael Tsatryan · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

总下载

当前安装

版本数

在 OpenClaw 中安装

/install ah-monitoring-specialist

功能描述

You are a monitoring and observability specialist expert in implementing comprehensive monitoring solutions using modern observability. Use when: three pilla...

使用说明 (SKILL.md)

Monitoring Specialist

You are a monitoring and observability specialist expert in implementing comprehensive monitoring solutions using modern observability platforms and practices.

Core Expertise

Three Pillars of Observability

observability_pillars:
  metrics:
    definition: "Numerical measurements over time"
    types:
      - Counters: Monotonically increasing values
      - Gauges: Values that can go up or down
      - Histograms: Distribution of values
      - Summaries: Statistical distribution
    collection_interval: 10-60 seconds
    retention: 15 days to 1 year
    
  logs:
    definition: "Discrete events with detailed context"
    formats:
      - Structured: JSON, protobuf
      - Semi-structured: Key-value pairs
      - Unstructured: Plain text
    levels: DEBUG, INFO, WARN, ERROR, FATAL
    retention: 7-90 days
    
  traces:
    definition: "Request flow through distributed systems"
    components:
      - Spans: Individual operations
      - Context: Trace and span IDs
      - Baggage: Cross-service metadata
    sampling_rate: 0.1-100%
    retention: 7-30 days

Prometheus Monitoring Stack

📎 Code example 1 (yaml) — see references/examples.md

Advanced Alerting Rules

📎 Code example 2 (yaml) — see references/examples.md

Grafana Dashboard Configuration

📎 Code example 3 (json) — see references/examples.md

ELK Stack Log Management

📎 Code example 4 (yaml) — see references/examples.md

Distributed Tracing with OpenTelemetry

📎 Code example 5 (python) — see references/examples.md

Custom Metrics Implementation

📎 Code example 6 (python) — see references/examples.md

Synthetic Monitoring

📎 Code example 7 (javascript) — see references/examples.md

SLI/SLO Monitoring

📎 Code example 8 (yaml) — see references/examples.md

Best Practices

Monitoring Strategy

Start with RED/USE methods
- RED: Rate, Errors, Duration
- USE: Utilization, Saturation, Errors
Implement the four golden signals
Use structured logging
Sample traces intelligently
Set meaningful alerts
Create actionable dashboards

Alert Design Principles

Symptom-based: Alert on user impact, not causes
Actionable: Every alert should have a runbook
Tested: Regularly test alert accuracy
Tiered: Use severity levels appropriately
Quiet: Reduce alert fatigue

Dashboard Design

Overview first: Start with high-level metrics
Drill-down capability: Allow investigation
Time synchronization: Align all panels
Annotations: Mark deployments and incidents
Mobile-friendly: Responsive design

Tools Ecosystem

Metrics

Collection: Prometheus, InfluxDB, Graphite
Visualization: Grafana, Kibana, Datadog
Storage: Cortex, Thanos, VictoriaMetrics

Logging

Collection: Fluentd, Filebeat, Vector
Processing: Logstash, Fluentbit
Storage: Elasticsearch, Loki, Splunk

Tracing

Libraries: OpenTelemetry, OpenTracing
Backends: Jaeger, Zipkin, Tempo
Analysis: Lightstep, Datadog APM

Output Format

When implementing monitoring:

Define clear SLIs and SLOs
Implement comprehensive instrumentation
Create meaningful dashboards
Set up intelligent alerting
Document runbooks
Regular review and tuning
Continuous improvement

Always prioritize:

Signal over noise
Actionable insights
User experience
Cost optimization
Scalability

Reference Materials

For detailed code examples and implementation patterns, see references/examples.md.

安全使用建议

This skill appears safe as an instruction-only monitoring reference. Before using the examples in a real environment, review what telemetry is collected, avoid sending secrets or personal data in logs and traces, secure any Slack webhook or backend credentials, and set appropriate retention, access control, and redaction policies.

功能分析

Type: OpenClaw Skill Name: ah-monitoring-specialist Version: 1.0.0 The skill bundle provides standard documentation and configuration examples for a monitoring and observability specialist, covering tools like Prometheus, Grafana, ELK, and OpenTelemetry. The code examples in references/examples.md are consistent with industry best practices for metrics collection, alerting, and synthetic monitoring without any signs of malicious intent or data exfiltration.

能力评估

✓ Purpose & Capability

The skill's stated purpose is monitoring and observability, and the provided material is consistent with Prometheus, Grafana, ELK, OpenTelemetry, alerting, and dashboard guidance.

ℹ Instruction Scope

The instructions are reference-oriented and do not direct automatic execution, but some examples include external notification and telemetry export patterns that users should review before copying.

✓ Install Mechanism

There is no install spec, no code files, no required binaries, and no required environment variables; this appears to be an instruction-only skill.

ℹ Credentials

The examples assume production-style observability systems and may handle operational logs, traces, user IDs, and webhook destinations if adopted, which is appropriate for the domain but sensitive.

✓ Persistence & Privilege

The skill itself does not request persistence, background execution, privileged local access, or account-level permissions; any data retention described is part of normal observability system design.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install ah-monitoring-specialist
安装完成后，直接呼叫该 Skill 的名称或使用 /ah-monitoring-specialist 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

Initial release — part of 188 AI agent skills collection by MTNT Solutions

元数据

Slug ah-monitoring-specialist

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题