← 返回 Skills 市场
samber

Golang Observability

作者 Samuel Berthe · GitHub ↗ · v1.1.3 · MIT-0
cross-platform ✓ 安全检测通过
178
总下载
0
收藏
0
当前安装
3
版本数
在 OpenClaw 中安装
/install golang-observability
功能描述
Golang everyday observability — the always-on signals in production. Covers structured logging with slog, Prometheus metrics, OpenTelemetry distributed traci...
使用说明 (SKILL.md)

Persona: You are a Go observability engineer. You treat every unobserved production system as a liability — instrument proactively, correlate signals to diagnose, and never consider a feature done until it is observable.

Modes:

  • Coding / instrumentation (default): Add observability to new or existing code — declare metrics, add spans, set up structured logging, wire pprof toggles. Follow the sequential instrumentation guide.
  • Review mode — reviewing a PR's instrumentation changes. Check that new code exports the expected signals (metrics declared, spans opened and closed, structured log fields consistent). Sequential.
  • Audit mode — auditing existing observability coverage across a codebase. Launch up to 5 parallel sub-agents — one per signal (metrics, logging, tracing, profiling, RUM) — to check coverage simultaneously.

Community default. A company skill that explicitly supersedes samber/cc-skills-golang@golang-observability skill takes precedence.

Go Observability Best Practices

Observability is the ability to understand a system's internal state from its external outputs. In Go services, this means five complementary signals: logs, metrics, traces, profiles, and RUM. Each answers different questions, and together they give you full visibility into both system behavior and user experience.

When using observability libraries (Prometheus client, OpenTelemetry SDK, vendor integrations), refer to the library's official documentation and code examples for current API signatures.

Best Practices Summary

  1. Use structured logging with log/slog — production services MUST emit structured logs (JSON), not freeform strings
  2. Choose the right log level — Debug for development, Info for normal operations, Warn for degraded states, Error for failures requiring attention
  3. Log with context — use slog.InfoContext(ctx, ...) to correlate logs with traces
  4. Prefer Histogram over Summary for latency metrics — Histograms support server-side aggregation and percentile queries. Every HTTP endpoint MUST have latency and error rate metrics.
  5. Keep label cardinality low in Prometheus — NEVER use unbounded values (user IDs, full URLs) as label values
  6. Track percentiles (P50, P90, P99, P99.9) using Histograms + histogram_quantile() in PromQL
  7. Set up OpenTelemetry tracing on new projects — configure the TracerProvider early, then add spans everywhere
  8. Add spans to every meaningful operation — service methods, DB queries, external API calls, message queue operations
  9. Propagate context everywhere — context is the vehicle that carries trace_id, span_id, and deadlines across service boundaries
  10. Enable profiling via environment variables — toggle pprof and continuous profiling on/off without redeploying
  11. Correlate signals — inject trace_id into logs, use exemplars to link metrics to traces
  12. A feature is not done until it is observable — declare metrics, add proper logging, create spans
  13. Use awesome-prometheus-alerts as a starting point for infrastructure and dependency alerting — browse by technology, copy rules, customize thresholds

Cross-References

See samber/cc-skills-golang@golang-error-handling skill for the single handling rule. See samber/cc-skills-golang@golang-troubleshooting skill for using observability signals to diagnose production issues. See samber/cc-skills-golang@golang-security skill for protecting pprof endpoints and avoiding PII in logs. See samber/cc-skills-golang@golang-context skill for propagating trace context across service boundaries. See samber/cc-skills@promql-cli skill for querying and exploring PromQL expressions against Prometheus from the CLI.

The Five Signals

Signal Question it answers Tool When to use
Logs What happened? log/slog Discrete events, errors, audit trails
Metrics How much / how fast? Prometheus client Aggregated measurements, alerting, SLOs
Traces Where did time go? OpenTelemetry Request flow across services, latency breakdown
Profiles Why is it slow / using memory? pprof, Pyroscope CPU hotspots, memory leaks, lock contention
RUM How do users experience it? PostHog, Segment Product analytics, funnels, session replay

Detailed Guides

Each signal has a dedicated guide with full code examples, configuration patterns, and cost analysis:

  • Structured Logging — Why structured logging matters for log aggregation at scale. Covers log/slog setup, log levels (Debug/Info/Warn/Error) and when to use each, request correlation with trace IDs, context propagation with slog.InfoContext, request-scoped attributes, the slog ecosystem (handlers, formatters, middleware), and migration strategies from zap/logrus/zerolog.

  • Metrics Collection — Prometheus client setup and the four metric types (Counter for rate-of-change, Gauge for snapshots, Histogram for latency aggregation). Deep dive: why Histograms beat Summaries (server-side aggregation, supports histogram_quantile PromQL), naming conventions, the PromQL-as-comments convention (write queries above metric declarations for discoverability), production-grade PromQL examples, multi-window SLO burn rate alerting, and the high-cardinality label problem (why unbounded values like user IDs destroy performance).

  • Distributed Tracing — When and how to use OpenTelemetry SDK to trace request flows across services. Covers spans (creating, attributes, status recording), otelhttp middleware for HTTP instrumentation, error recording with span.RecordError(), trace sampling (why you can't collect everything at scale), propagating trace context across service boundaries, and cost optimization.

  • Profiling — On-demand profiling with pprof (CPU, heap, goroutine, mutex, block profiles) — how to enable it in production, secure it with auth, and toggle via environment variables without redeploying. Continuous profiling with Pyroscope for always-on performance visibility. Cost implications of each profiling type and mitigation strategies.

  • Real User Monitoring — Understanding how users actually experience your service. Covers product analytics (event tracking, funnels), Customer Data Platform integration, and critical compliance: GDPR/CCPA consent checks, data subject rights (user deletion endpoints), and privacy checklist for tracking. Server-side event tracking (PostHog, Segment) and identity key best practices.

  • Alerting — Proactive problem detection. Covers the four golden signals (latency, traffic, errors, saturation), awesome-prometheus-alerts as a rule library with ~500 ready-to-use rules by technology, Go runtime alerts (goroutine leaks, GC pressure, OOM risk), severity levels, and common mistakes that break alerting (using irate instead of rate, missing for: duration to avoid flapping).

  • Grafana Dashboards — Prebuilt dashboards for Go runtime monitoring (heap allocation, GC pause frequency, goroutine count, CPU). Explains the standard dashboards to install, how to customize them for your service, and when each dashboard answers a different operational question.

Correlating Signals

Signals are most powerful when connected. A trace_id in your logs lets you jump from a log line to the full request trace. An exemplar on a metric links a latency spike to the exact trace that caused it.

Logs + Traces: otelslog bridge

import "go.opentelemetry.io/contrib/bridges/otelslog"

// Create a logger that automatically injects trace_id and span_id
logger := otelslog.NewHandler("my-service")
slog.SetDefault(slog.New(logger))

// Now every slog call with context includes trace correlation
slog.InfoContext(ctx, "order created", "order_id", orderID)
// Output includes: {"trace_id":"abc123", "span_id":"def456", "msg":"order created", ...}

Metrics + Traces: Exemplars

// When recording a histogram observation, attach the trace_id as an exemplar
// so you can jump from a P99 spike directly to the offending trace
histogram.WithLabelValues("POST", "/orders").
    Exemplar(prometheus.Labels{"trace_id": traceID}, duration)

Migrating Legacy Loggers

If the project currently uses zap, logrus, or zerolog, migrate to log/slog. It is the standard library logger since Go 1.21, has a stable API, and the ecosystem has consolidated around it. Continuing with third-party loggers means maintaining an extra dependency for no benefit.

Migration strategy:

  1. Add slog as the new logger with slog.SetDefault()
  2. Use bridge handlers during migration to route slog output through the existing logger: samber/slog-zap, samber/slog-logrus, samber/slog-zerolog
  3. Gradually replace all zap.L().Info(...) / logrus.Info(...) / log.Info().Msg(...) calls with slog.Info(...)
  4. Once fully migrated, remove the bridge handler and the old logger dependency

Definition of Done for Observability

A feature is not production-ready until it is observable. Before marking a feature as done, verify:

  • Metrics declared — counters for operations/errors, histograms for latencies, gauges for saturation. Each metric var has PromQL queries and alert rules as comments above its declaration.
  • Logging is proper — structured key-value pairs with slog, context variants used (slog.InfoContext), no PII in logs, errors MUST be either logged OR returned (NEVER both).
  • Spans created — every service method, DB query, and external API call has a span with relevant attributes, errors recorded with span.RecordError().
  • Dashboards and alerts exist — the PromQL from your metric comments is wired into Grafana dashboards and Prometheus alerting rules. Check awesome-prometheus-alerts for ready-to-use rules covering your infrastructure dependencies (databases, caches, brokers, proxies).
  • RUM events tracked — key business events tracked server-side (PostHog/Segment), identity key is user_id (not email), consent checked before tracking.

Common Mistakes

// ✗ Bad — log AND return (error gets logged multiple times up the chain)
if err != nil {
    slog.Error("query failed", "error", err)
    return fmt.Errorf("query: %w", err)
}

// ✓ Good — return with context, log once at the top level
if err != nil {
    return fmt.Errorf("querying users: %w", err)
}
// ✗ Bad — high-cardinality label (unbounded user IDs)
httpRequests.WithLabelValues(r.Method, r.URL.Path, userID).Inc()

// ✓ Good — bounded label values only
httpRequests.WithLabelValues(r.Method, routePattern).Inc()
// ✗ Bad — not passing context (breaks trace propagation)
result, err := db.Query("SELECT ...")

// ✓ Good — context flows through, trace continues
result, err := db.QueryContext(ctx, "SELECT ...")
// ✗ Bad — using Summary for latency (can't aggregate across instances)
prometheus.NewSummary(prometheus.SummaryOpts{
    Name:       "http_request_duration_seconds",
    Objectives: map[float64]float64{0.99: 0.001},
})

// ✓ Good — use Histogram (aggregatable, supports histogram_quantile)
prometheus.NewHistogram(prometheus.HistogramOpts{
    Name:    "http_request_duration_seconds",
    Buckets: prometheus.DefBuckets,
})
安全使用建议
This skill appears coherent and focused on Go observability best practices, but before installing or letting an agent run it in your repo, consider the following: - Review all suggested code changes in PR form before merging. The skill includes instructions that modify code (logging, metrics, middleware) and has Write/Bash permissions in its allowed-tools list — avoid automatic commits without human review. - Do NOT hard-code API keys or service URLs in code. Use environment variables or a secrets manager for PostHog, Segment, Pyroscope, etc. The SKILL.md references many env vars but does not request them from the platform — you remain in control of secrets. - Secure profiling endpoints (pprof) and continuous-profiling backends. The skill recommends toggling profiling via env vars and warns to protect pprof; follow that: never expose pprof publicly and be cautious sending profiling data to third-party SaaS. - Avoid sending PII to analytics/CDPs. The guidance repeatedly warns about identity keys and GDPR/CCPA — enforce those rules and confirm the agent's changes respect data-minimization and consent checks. - If you will allow autonomous agent runs, restrict them to non-production repos/environments (or require explicit approval for production changes). Autonomous invocation plus repo write access increases blast radius if misapplied. If you want deeper assurance, ask the skill owner for an example PR or patch the agent would create for a small change (e.g., add JSON slog handler) and review that patch before allowing broader runs.
功能分析
Type: OpenClaw Skill Name: golang-observability Version: 1.1.3 The golang-observability skill bundle is a comprehensive and well-structured guide for implementing production-grade monitoring in Go applications. It promotes industry best practices such as structured logging with 'slog', OpenTelemetry tracing, and Prometheus metrics while explicitly warning against security risks like logging PII, exposing unprotected pprof endpoints, and high-cardinality metric labels. The inclusion of GDPR/CCPA compliance checklists and the use of legitimate community resources (e.g., awesome-prometheus-alerts) further confirm its alignment with professional engineering standards without any evidence of malicious intent or unauthorized data access.
能力评估
Purpose & Capability
Name and description (Go observability) align with the declared requirements: it requires the 'go' binary and is an instruction-only skill. Allowed tooling (git, go, golangci-lint, read/edit/write) is appropriate for instrumenting and modifying Go code. No unrelated cloud credentials or system paths are requested.
Instruction Scope
SKILL.md is an extensive instruction set for adding logs/metrics/tracing/profiling/RUM. It recommends adding instrumentation, editing code, and configuring third-party backends (PostHog, Segment, Pyroscope) via environment variables. It does not instruct reading unrelated system secrets or exfiltrating data, but it does reference integration endpoints and env vars that an implementer must supply. The skill also mentions launching parallel sub-agents to audit coverage (expected for an automation-capable coding assistant).
Install Mechanism
No install spec and no code files are included; instruction-only skills are low-risk because they do not download or execute external packages during install.
Credentials
The skill declares no required environment variables or credentials (proportionate). However the guidance contains many examples that use env vars and third-party API keys (POSTHOG_API_KEY, SEGMENT_WRITE_KEY, PYROSCOPE_URL, PROFILING_ENABLED). This is expected for observability integrations, but implementers must ensure keys are kept out of source and not hard-coded.
Persistence & Privilege
The skill is not always-on (always:false), does not request persistent system-level privileges, and is user-invocable. Autonomous model invocation remains enabled (platform default) but is not combined with excessive privileges or credential requests.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install golang-observability
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /golang-observability 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.1.3
golang-observability 1.1.3 - Updated skill version and metadata to 1.1.3. - Added AskUserQuestion tool to allowed-tools for improved interactivity. - Minor documentation cleanup and maintenance in SKILL.md, references/logging.md, and references/metrics.md.
v1.1.1
golang-observability 1.1.1 - Version updated to 1.1.1 in SKILL.md metadata. - Added new file: evals/evals.json (contents not shown). - No changes to documented best practices or functional content. - Maintains all previously listed modes and guidance.
v0.1.0
Initial release of golang-observability skill. - Provides best practices and actionable guidance for instrumenting Go services with five key observability signals: logs, metrics, traces, profiles, and RUM. - Covers structured logging with slog, Prometheus metrics, OpenTelemetry tracing, pprof/Pyroscope profiling, alerting, Grafana dashboards, and privacy-compliant tracking. - Includes dedicated operation modes: coding/instrumentation, review, and parallel audit. - Advice on migrating legacy loggers (zap/logrus/zerolog) to slog and integrating with Customer Data Platforms (CDP). - Compact "best practices" summary plus cross-references to related diagnostic, security, and troubleshooting skills.
元数据
Slug golang-observability
版本 1.1.3
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 3
常见问题

Golang Observability 是什么?

Golang everyday observability — the always-on signals in production. Covers structured logging with slog, Prometheus metrics, OpenTelemetry distributed traci... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 178 次。

如何安装 Golang Observability?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install golang-observability」即可一键安装,无需额外配置。

Golang Observability 是免费的吗?

是的,Golang Observability 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

Golang Observability 支持哪些平台?

Golang Observability 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 Golang Observability?

由 Samuel Berthe(@samber)开发并维护,当前版本 v1.1.3。

💬 留言讨论