功能描述

Analyze canary deployments by comparing metrics between canary and baseline. Provide data-driven promotion/rollback recommendations based on error rates, lat...

使用说明 (SKILL.md)

Canary Deployment Analyzer

Name: Canary Deployment Analyzer
Author: charlie-morrison

Analyze canary deployments to decide whether to promote or rollback. Compare error rates, latency distributions, business metrics, and log patterns between canary and baseline populations — then give a data-driven recommendation.

Use when: "analyze canary", "should we promote this canary", "compare canary metrics", "canary vs baseline", "is this deploy safe to promote", "canary health check", or during progressive delivery decisions.

Commands

1. `analyze` — Full Canary Analysis

Step 1: Collect Metrics

Identify the metrics source (Prometheus, Datadog, CloudWatch, custom):

# Prometheus query examples
# Error rate — canary vs stable
curl -s "$PROMETHEUS_URL/api/v1/query" --data-urlencode \
  'query=sum(rate(http_requests_total{status=~"5..",deployment="canary"}[5m])) / sum(rate(http_requests_total{deployment="canary"}[5m]))' | \
  python3 -c "import json,sys;r=json.load(sys.stdin);print(f'Canary error rate: {r[\"data\"][\"result\"][0][\"value\"][1] if r[\"data\"][\"result\"] else \"no data\"}')"

# Same for baseline
curl -s "$PROMETHEUS_URL/api/v1/query" --data-urlencode \
  'query=sum(rate(http_requests_total{status=~"5..",deployment="stable"}[5m])) / sum(rate(http_requests_total{deployment="stable"}[5m]))' | \
  python3 -c "import json,sys;r=json.load(sys.stdin);print(f'Baseline error rate: {r[\"data\"][\"result\"][0][\"value\"][1] if r[\"data\"][\"result\"] else \"no data\"}')"

# Latency p50/p95/p99
for q in 50 95 99; do
  curl -s "$PROMETHEUS_URL/api/v1/query" --data-urlencode \
    "query=histogram_quantile(0.${q}, sum(rate(http_request_duration_seconds_bucket{deployment=\"canary\"}[5m])) by (le))"
done

If no Prometheus, check for:

Datadog: curl -s "https://api.datadoghq.com/api/v1/query" -H "DD-API-KEY: $DD_API_KEY" --data-urlencode "query=avg:http.request.duration{deployment:canary}"
CloudWatch: aws cloudwatch get-metric-statistics --namespace MyApp --metric-name ErrorRate --dimensions Name=Deployment,Value=canary
Application logs: parse error counts from structured logs

Step 2: Statistical Comparison

For each metric, calculate:

Absolute difference: canary_value - baseline_value
Relative change: (canary - baseline) / baseline × 100%
Statistical significance: For rates, use a two-proportion z-test; for latencies, use Welch's t-test or Mann-Whitney U if distributions are skewed

Decision thresholds (configurable):

Error rate increase > 0.1% absolute OR > 10% relative → FAIL
p95 latency increase > 50ms OR > 15% relative → WARNING
p99 latency increase > 200ms OR > 25% relative → FAIL
Business metric (conversion, throughput) decrease > 5% → WARNING

Step 3: Log Analysis

# Compare error log patterns
# Canary errors
kubectl logs -l deployment=canary --since=1h 2>/dev/null | grep -i "error\|exception\|panic\|fatal" | \
  sort | uniq -c | sort -rn | head -20

# Baseline errors
kubectl logs -l deployment=stable --since=1h 2>/dev/null | grep -i "error\|exception\|panic\|fatal" | \
  sort | uniq -c | sort -rn | head -20

Look for:

New error types in canary that don't appear in baseline (strongest signal)
Error rate spike in existing error types
Timeout patterns or connection refused (infrastructure issues vs code issues)

Step 4: Generate Verdict

# Canary Analysis Report

## Verdict: PROMOTE / ROLLBACK / HOLD

## Metrics Comparison (last 30 min)
| Metric | Baseline | Canary | Delta | Status |
|--------|----------|--------|-------|--------|
| Error rate | 0.12% | 0.14% | +0.02% | ✅ Pass |
| p50 latency | 45ms | 48ms | +3ms | ✅ Pass |
| p95 latency | 180ms | 210ms | +30ms | ✅ Pass |
| p99 latency | 450ms | 620ms | +170ms | ⚠️ Warning |
| Throughput | 1200 rps | 1180 rps | -1.7% | ✅ Pass |

## New Errors in Canary
- `NullPointerException in UserService.getProfile` (23 occurrences)
  → Not present in baseline — likely regression

## Traffic Split
- Canary: 5% (60 rps)
- Baseline: 95% (1140 rps)
- Observation window: 30 min (sufficient for 5% traffic)

## Recommendation
[PROMOTE] Metrics within acceptable thresholds. p99 latency elevated but within warning range.
Monitor p99 closely after full promotion. Investigate NullPointerException — non-blocking but should be tracked.

2. `thresholds` — Configure Promotion Criteria

Help define canary promotion thresholds based on SLOs:

If team has SLOs → derive thresholds from error budget remaining
If no SLOs → suggest industry defaults (99.9% availability = 0.1% error budget)
Generate a config file for Argo Rollouts, Flagger, or custom canary controller

3. `progressive` — Design Progressive Delivery Strategy

Given a service profile (traffic volume, criticality, deployment frequency), recommend:

Traffic split stages (1% → 5% → 25% → 50% → 100%)
Observation window per stage
Automated vs manual promotion gates
Rollback trigger conditions

安全使用建议

This skill appears to do what its name says (compare canary vs baseline), but the instructions rely on cluster/monitoring credentials and CLIs that the manifest does not declare. Before installing or using it: 1) Confirm which credentials it will need (Prometheus URL, Datadog API key, AWS creds, kubeconfig) and provide least-privilege, read-only access (e.g., read-only AWS IAM role, limited Datadog key, Kubernetes service account limited to log access). 2) Run the skill in a controlled environment (staging account or isolated cluster) first so you don't expose production secrets. 3) Consider modifying the skill to explicitly declare required env vars and binaries in metadata so you can audit what will be accessed. 4) If you share agent access with others, avoid storing long-lived credentials in the agent environment; prefer short-lived tokens or scoped service accounts. 5) If author/source is unknown/untrusted, treat credential provision as risky — request provenance or a signed manifest before granting access.

功能分析

Type: OpenClaw Skill Name: canary-deployment-analyzer Version: 1.0.0 The canary-deployment-analyzer skill is designed to compare metrics and logs between canary and baseline deployments. It uses standard DevOps tools and APIs (Prometheus, Datadog, CloudWatch, and kubectl) to perform its analysis. The commands and instructions in SKILL.md are well-aligned with the stated purpose and do not exhibit signs of malicious intent, data exfiltration, or unauthorized execution.

能力标签

requires-sensitive-credentials

能力评估

⚠ Purpose & Capability

The declared purpose—analyzing canary vs baseline metrics—is consistent with the commands shown (Prometheus/Datadog/CloudWatch queries, latency/error comparisons, log analysis). However, the skill metadata declares no required environment variables or binaries while the instructions explicitly rely on PROMETHEUS_URL, DD_API_KEY, aws CLI with AWS credentials, kubectl, and python3. That mismatch (the skill asking for access at runtime but not declaring it up-front) is a red flag for incoherence and operational surprise.

⚠ Instruction Scope

The SKILL.md instructs the agent to run network calls and cluster-level commands: curl against monitoring APIs, use Datadog API key, call aws cloudwatch, and run kubectl logs on deployments. Those actions are within the declared purpose, but they give the agent broad read access to monitoring/cluster data and require sensitive credentials. The instructions also implicitly allow the agent to choose which data sources to query (Prometheus vs Datadog vs CloudWatch vs logs), which is open-ended and increases blast radius if credentials are provided inadvertently.

✓ Install Mechanism

No install spec and no code files — the skill is instruction-only, so nothing is written to disk or fetched automatically. This lowers supply-chain risk. However, runtime commands will invoke local CLIs and network calls, so the lack of install does not eliminate operational risk.

⚠ Credentials

The SKILL.md expects environment variables and credentials (PROMETHEUS_URL, DD_API_KEY, AWS credentials for aws CLI, possibly Kubernetes kubeconfig or cluster auth for kubectl) but the skill metadata declares none. Requiring these secrets at runtime without declaring them is disproportionate transparency-wise and could lead to accidental credential exposure when a user attempts to use the skill. The skill also references python3 and kubectl without declaring required binaries.

✓ Persistence & Privilege

The skill is not marked 'always: true' and is user-invocable only; it does not request persistent or elevated platform privileges in the manifest. As an instruction-only skill it does not modify other skills or agent-wide config. Autonomous invocation is allowed by default but is not combined with other manifest-level privileges here.

版本历史

v1.0.0

Initial release of canary-deployment-analyzer. - Analyze canary deployments by comparing error rates, latency percentiles, logs, and business metrics between canary and baseline. - Provides data-driven recommendations to promote, rollback, or hold based on configurable thresholds. - Supports metric collection from Prometheus, Datadog, CloudWatch, and application logs. - Commands include canary analysis, threshold configuration, and progressive delivery strategy recommendations. - Outputs clear verdict reports with detailed metrics tables and log anomaly highlights.

元数据

Slug canary-deployment-analyzer

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题