功能描述

Detect and resolve backpressure issues in data pipelines, message queues, and streaming systems. Identify bottleneck stages, measure queue depths and process...

使用说明 (SKILL.md)

Backpressure Analyzer

Name: backpressure-analyzer
Author: charlie-morrison

Find where your pipeline is backing up. Measure processing rates at each stage, identify the bottleneck, detect growing queues, and recommend flow control strategies — bounded buffers, rate limiting, load shedding, or autoscaling.

Use when: "pipeline is slow", "queue keeps growing", "messages backing up", "consumer can't keep up", "producer faster than consumer", "backpressure", "flow control", "bottleneck analysis", or when processing delays increase over time.

Commands

1. `detect` — Find Backpressure Points

Step 1: Measure Queue Depths

# Kafka consumer lag
kafka-consumer-groups --bootstrap-server $KAFKA_BROKER --describe --all-groups 2>/dev/null | \
  awk 'NR>1 && $6>0 {printf "%-30s %-20s lag=%s\
", $1, $4, $6}' | sort -t= -k2 -rn | head -20

# RabbitMQ queue depths
rabbitmqctl list_queues name messages consumers 2>/dev/null | \
  awk '$2>0 {print $2 "	" $1 "	consumers=" $3}' | sort -rn | head -20

# AWS SQS
for queue_url in $(aws sqs list-queues --query 'QueueUrls[]' --output text); do
  attrs=$(aws sqs get-queue-attributes --queue-url "$queue_url" \
    --attribute-names ApproximateNumberOfMessages ApproximateNumberOfMessagesNotVisible \
    --output json 2>/dev/null)
  visible=$(echo "$attrs" | python3 -c "import json,sys;print(json.load(sys.stdin)['Attributes'].get('ApproximateNumberOfMessages','0'))")
  inflight=$(echo "$attrs" | python3 -c "import json,sys;print(json.load(sys.stdin)['Attributes'].get('ApproximateNumberOfMessagesNotVisible','0'))")
  if [ "$visible" -gt 0 ] 2>/dev/null; then
    echo "Queue: $(basename $queue_url) — pending=$visible, in-flight=$inflight"
  fi
done

# Redis Streams
redis-cli XINFO STREAM mystream 2>/dev/null | grep -E "length|groups"
redis-cli XINFO GROUPS mystream 2>/dev/null

Step 2: Measure Processing Rates

# Measure throughput at each pipeline stage
# Take two snapshots 60s apart and calculate rate

# Kafka: messages produced vs consumed per second
kafka-consumer-groups --bootstrap-server $KAFKA_BROKER --describe --group mygroup 2>/dev/null | \
  awk 'NR>1 {lag+=$6; offset+=$4} END {print "Total lag:", lag, "Current offset:", offset}'

# Process-level: messages processed per second
# Check application metrics endpoint
curl -s http://localhost:9090/metrics | grep -E "messages_processed_total|items_processed_total"

Step 3: Identify Bottleneck

Map the pipeline stages and their rates:

Producer (1000 msg/s) → Queue A (depth: 5) → Stage 1 (800 msg/s) → Queue B (depth: 50000) → Stage 2 (200 msg/s) → Queue C (depth: 2) → Stage 3 (500 msg/s)
                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                                        BOTTLENECK: Stage 2 can't keep up

The bottleneck is the stage with:

Growing queue depth (input queue getting deeper over time)
Lowest throughput relative to its input rate
Highest resource utilization (CPU, memory, I/O at capacity)

Step 4: Generate Report

# Backpressure Analysis Report

## Pipeline: Order Processing

## Flow Map

API (1000 req/s) → order-events (Kafka, lag: 45,000 ⚠️, growing +200/min) → order-validator (3 pods, 350 msg/s each = 1050 total) → validated-orders (Kafka, lag: 200, stable ✅) → payment-processor (2 pods, 150 msg/s each = 300 total) → payment-results (Kafka, lag: 85,000 🔴, growing +700/min) → notification-sender (1 pod, 500 msg/s)


## Bottleneck: payment-processor
- **Input rate:** 1050 msg/s (from validator)
- **Processing rate:** 300 msg/s (2 pods × 150 msg/s)
- **Deficit:** 750 msg/s accumulating in queue
- **Current backlog:** 85,000 messages (~4.7 hours to drain at current rate)
- **Resource utilization:** CPU 95%, memory 60%, network 20%
- **Root cause:** CPU-bound — payment validation is computationally expensive

## Recommendations (in order)
1. **Scale out:** Increase payment-processor to 7 pods (7 × 150 = 1050 msg/s)
   - Cost: +5 pods × $X/month
   - Time to drain backlog: ~2.5 hours after scaling

2. **Optimize processing:** Profile payment validation for optimization
   - Current: 6.7ms per message
   - Target: 1ms per message (would need only 2 pods)

3. **Add backpressure signal:** Have payment-processor signal order-validator to slow down
   - Reactive Streams-style demand signaling
   - Or: consumer pause when lag > threshold

4. **Load shedding (last resort):** Drop low-priority messages when queue > 100K
   - Only for non-critical notifications, never for payments

2. `strategies` — Recommend Flow Control

Based on the pipeline characteristics, recommend:

Bounded buffers: Set max queue size, block producer when full
Rate limiting: Limit producer rate to match slowest consumer
Autoscaling: Scale consumers based on queue depth
Load shedding: Drop low-priority messages under pressure
Batch processing: Accumulate and process in batches for efficiency
Circuit breaker: Stop sending to overwhelmed downstream
Priority queues: Process critical messages first when backed up

3. `monitor` — Set Up Backpressure Alerts

Generate alerting rules:

# Prometheus alert rules
groups:
  - name: backpressure
    rules:
      - alert: KafkaConsumerLagHigh
        expr: kafka_consumergroup_lag_sum > 10000
        for: 5m
        labels:
          severity: warning
      - alert: KafkaConsumerLagCritical
        expr: kafka_consumergroup_lag_sum > 100000
        for: 5m
        labels:
          severity: critical
      - alert: QueueDepthGrowing
        expr: rate(kafka_consumergroup_lag_sum[5m]) > 0
        for: 15m
        labels:
          severity: warning

安全使用建议

This skill's instructions are plausible for diagnosing backpressure, but they call cloud and local CLIs (AWS, Kafka, RabbitMQ, Redis) and reference $KAFKA_BROKER without declaring credentials. Before installing: 1) Ask the publisher for a source/homepage and explicit list of required env vars and permissions. 2) If you run it, avoid giving it high-privilege AWS credentials — use a read-only, least-privilege IAM role scoped to just the queues/topics needed. 3) Run first in an isolated/test environment where those CLIs exist and credentials are safe. 4) Inspect and, if needed, constrain the exact commands the agent will run (limit which queues or clusters it can enumerate). 5) Prefer skills that declare required env vars and document required CLIs and permissions; absence of those declarations is the main red flag here. Additional information that would raise confidence: a trusted source/homepage, explicit env var/credential declarations, or code/tests demonstrating exactly what will be queried and why.

功能分析

Type: OpenClaw Skill Name: backpressure-analyzer Version: 1.0.0 The backpressure-analyzer skill is designed to help SREs and developers identify bottlenecks in data pipelines. It provides standard diagnostic commands for Kafka, RabbitMQ, AWS SQS, and Redis, along with templates for reporting and Prometheus alerting. The code (SKILL.md) uses legitimate administrative tools and local metrics endpoints (e.g., localhost:9090) without any evidence of data exfiltration, unauthorized remote access, or malicious prompt injection.

能力标签

cryptocan-make-purchases

能力评估

ℹ Purpose & Capability

The described functionality (Kafka, RabbitMQ, SQS, Redis Streams, Prometheus) aligns with backpressure analysis. However the SKILL.md uses $KAFKA_BROKER and AWS CLI commands even though the skill metadata declares no required env vars or credentials — a mismatch between what the skill does and what it asks for.

⚠ Instruction Scope

Runtime instructions tell the agent to run many system/cloud commands (kafka-consumer-groups, rabbitmqctl, aws sqs, redis-cli, curl against localhost metrics, etc.). These commands will access local services and cloud accounts and can reveal sensitive operational data. The instructions assume access to credentials and network endpoints that are not declared or constrained.

ℹ Install Mechanism

No install spec or code files (instruction-only) — low install risk. But the skill implicitly requires many CLIs (aws, kafka-consumer-groups, rabbitmqctl, redis-cli, curl, python3) to be present; absence of an install step or validation means it may fail or behave inconsistently depending on the environment.

⚠ Credentials

The skill declares no required env vars/credentials, yet the instructions use $KAFKA_BROKER and call AWS CLI (which uses AWS credentials). This is disproportionate: the skill should explicitly declare the exact credentials and permissions it needs. As written, it could cause an agent to access cloud account credentials unexpectedly.

✓ Persistence & Privilege

The skill is not 'always: true' and has no install or persistent components, so it does not request permanent presence or system-wide changes. Note: the agent may still invoke it autonomously (default), which combined with the environment/credential concerns increases risk.

版本历史

v1.0.0

Initial release of the backpressure-analyzer skill: - Detects and pinpoints backpressure issues in data pipelines, queues, and streaming systems. - Measures queue depths and processing rates at each stage to identify bottlenecks. - Generates detailed backpressure analysis reports with recommendations. - Suggests flow control and scaling strategies, including bounding buffers, rate limiting, load shedding, and autoscaling. - Provides example monitoring and alerting rules for proactive detection.

元数据

Slug backpressure-analyzer

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题