功能描述

DP 数据处理平台运维顾问。当用户提到检查平台、作业失败、作业状态、吞吐量分析、故障诊断、运维报告等运维需求时激活。

使用说明 (SKILL.md)

dp-ops-advisor

Name: AI-powered DP Platform Operations Advisor
Author: hxp365

Purpose

Monitor, diagnose, and advise on DP Data Processing Platform job operations. This skill provides intelligent operational support: checking job health, interpreting metrics, diagnosing failures, suggesting fixes, and generating incident reports.

Environment Configuration

The DP platform runs at ${DP_SERVER_URL}.

Required environment variables:

 DP_SERVER_URL=${DP_SERVER_URL}   # REQUIRED — DP platform base URL
DP_API_KEY=${DP_API_KEY}         # REQUIRED — obtain from platform「API Key 管理」page

ALL curl commands MUST use -H 'X-DP-API-Key: ${DP_API_KEY}'. No other authentication method is supported.

首次使用引导

# 校验 DP_API_KEY — 未配置则终止
if [ -z "${DP_API_KEY}" ]; then
  echo "======================================"
  echo "  DP Platform — API Key 必填"
  echo "======================================"
  echo "错误：未检测到 DP_API_KEY，无法继续。"
  echo ""
  echo "请按以下步骤配置："
  echo "1. 访问 DP 平台控制台：${DP_SERVER_URL}"
  echo "2. 注册账号（如需邀请码请联系管理员）"
  echo "3. 进入「API Key 管理」→「申请新 Key」"
  echo "4. 将生成的 Key 配置到 DP_API_KEY 环境变量"
  echo ""
  echo "免费版：100次/月。超额后需升级订阅套餐。"
  echo "======================================"
  exit 1
fi
echo "API Key 已验证：${DP_API_KEY:0:8}****"

配额说明

免费版：100 次/月 API 调用额度
超额时响应中会包含 quota_exceeded: true 字段
响应中的 upgrade_url 字段指向订阅升级页面
升级套餐可获得更多额度：BASIC(1000次)、PRO(10000次)、ENTERPRISE(不限)

Capabilities

Real-time health check of all jobs (running/failed/stalled)
Per-operator throughput and latency analysis
Failure root-cause analysis from error logs
Auto-restart policy evaluation and execution
Stall detection (job running but no data flowing)
Historical run trend analysis
Generate incident report summaries
Recommend configuration tuning based on observed metrics

Context Files

File	Purpose
`dp-api-reference.md`	REST API endpoints for status, logs, progress
`dp-operator-catalog.json`	Operator descriptions to contextualize metrics

Prerequisites Check

# Verify DP Server connectivity
curl -s --connect-timeout 3 ${DP_SERVER_URL}/homepage && echo "DP Server OK" || echo "DP Server NOT running"

# Auth: API Key is REQUIRED — no session fallback allowed
if [ -z "${DP_API_KEY}" ]; then
  echo "ERROR: DP_API_KEY is not set. Please configure DP_API_KEY environment variable."
  exit 1
fi
AUTH="-H 'X-DP-API-Key: ${DP_API_KEY}'"
echo "Auth: API Key mode (${DP_API_KEY:0:8}****)"

Workflow

Mode 1: Platform Health Check (全局健康检查)

Triggered when user says: "检查平台状态" / "哪些作业有问题" / "平台健不健康"

# Get all job statuses
ALL_STATUS=$(curl -H "X-DP-API-Key: ${DP_API_KEY}" -s "${DP_SERVER_URL}/job/status")

# Parse and categorize
echo "$ALL_STATUS" | python3 -c "
import sys, json
jobs = json.load(sys.stdin)

running = [j for j in jobs if j.get('state') == 'RUNNING']
failed  = [j for j in jobs if j.get('state') == 'FAILED']
stoping = [j for j in jobs if j.get('state') == 'STOPING']
waiting = [j for j in jobs if j.get('state') == 'Waiting']
finished= [j for j in jobs if j.get('state') == 'FINISHED']
idle    = [j for j in jobs if not j.get('state')]

print('=== DP Platform Health Report ===')
print(f'Total jobs: {len(jobs)}')
print(f'  RUNNING  : {len(running)}')
print(f'  FAILED   : {len(failed)}')
print(f'  FINISHED : {len(finished)}')
print(f'  WAITING  : {len(waiting)}')
print(f'  IDLE     : {len(idle)}')
print()

if failed:
    print('!! FAILED JOBS (need attention):')
    for j in failed:
        print(f'   - {j.get("jobID","?")} | {j.get("state")}')
    print()

if running:
    print('OK RUNNING JOBS:')
    for j in running:
        print(f'   - {j.get("jobID","?")}')
"

After health check, proactively:

For each FAILED job: offer to diagnose (Mode 3)
For each RUNNING job: offer to check throughput (Mode 2)

Mode 2: Job Throughput Analysis (吞吐量分析)

Triggered when user says: "看一下 [job] 的运行情况" / "数据有没有在流动" / "作业跑的快不快"

JOB_ID="$1"  # job ID from user or from health check

# Get per-operator metrics
PROGRESS=$(curl -H "X-DP-API-Key: ${DP_API_KEY}" -s "${DP_SERVER_URL}/job/progress?id=$JOB_ID")

echo "$PROGRESS" | python3 -c "
import sys, json
data = json.load(sys.stdin)
print(f'Job: {data["name"]} (ID: {data["id"]})')
print(f'Path: {data.get("path","")}')
print()
print(f'{'Operator':\x3C30} {'Status':\x3C12} {'Records In':\x3C15} {'Records Out':\x3C15} {'Speed(rec/s)':\x3C15} {'ByteSpeed':\x3C12}')
print('-' * 100)
for op in data.get('data', []):
    status = op.get('status', '-')
    r_in   = op.get('recordsRead', '-')
    r_out  = op.get('recordsWritten', '-')
    speed  = op.get('speed', '-')
    bspeed = op.get('byteSpeed', '-')
    name   = op.get('name', op.get('id', '?'))[:28]
    
    # Flag potential stall: speed=0 but status=RUNNING
    flag = ' !! STALLED?' if status == 'RUNNING' and speed == '0' else ''
    print(f'{name:\x3C30} {status:\x3C12} {r_in:\x3C15} {r_out:\x3C15} {speed:\x3C15} {bspeed:\x3C12}{flag}')
"

Stall Detection Logic:

A job is stalled if:

status = RUNNING but speed = 0 AND recordsRead has not changed over 2 consecutive checks

When stall detected:

# Take two readings 30 seconds apart and compare recordsRead
PROG1=$(curl -H "X-DP-API-Key: ${DP_API_KEY}" -s "${DP_SERVER_URL}/job/progress?id=$JOB_ID")
sleep 30
PROG2=$(curl -H "X-DP-API-Key: ${DP_API_KEY}" -s "${DP_SERVER_URL}/job/progress?id=$JOB_ID")

python3 -c "
import sys, json
p1 = json.loads('$PROG1')
p2 = json.loads('$PROG2')
ops1 = {op['id']: op for op in p1.get('data',[])}
ops2 = {op['id']: op for op in p2.get('data',[])}
print('Stall detection (30s interval):')
for pid, op2 in ops2.items():
    op1 = ops1.get(pid, {})
    r1 = op1.get('recordsRead','0')
    r2 = op2.get('recordsRead','0')
    if r1 == r2 and op2.get('status') == 'RUNNING':
        print(f'  STALLED: {op2.get("name",pid)} - records unchanged at {r2}')
    else:
        delta = str(int(r2 or 0) - int(r1 or 0)) if r1 != '-' and r2 != '-' else '?'
        print(f'  OK: {op2.get("name",pid)} - processed +{delta} records')
"

Response Format

Always structure responses with:

[Health]: Overall platform/job status

[Findings]: Specific issues identified

[Diagnosis]: Root cause analysis

[Actions]: What was done or should be done

[Status]: Current state after any actions

Limitations & Guardrails

Never auto-restart without user confirmation in production scenarios.
Never modify job configurations — only report and advise (use dp-pipeline-designer skill to make changes).
Do not expose raw stack traces to non-technical users — summarize the error in plain language.
If DP Server is unreachable, guide user to check and start services. Provide the startup command.
Session refresh: If API returns "Authentication required", automatically re-login before retrying.

安全使用建议

This skill appears to actually require DP_SERVER_URL and DP_API_KEY even though the registry metadata omitted them — confirm that before installing. Verify the skill's origin (source is unknown) and ask the publisher for missing context files (dp-api-reference.md and dp-operator-catalog.json). Only provide an API key that has minimal permissions and rate limits (prefer a read-only or scoped key). Be aware the skill will send that key in request headers to whatever DP_SERVER_URL you configure and will print the first 8 characters of the key to stdout (may appear in logs/agent UI). If you cannot verify the publisher or the endpoints, do not install or use production/high-privilege keys. If you proceed, audit network requests to the DP_SERVER_URL and prefer creating a dedicated service account/key limited to monitoring read actions.

功能分析

Type: OpenClaw Skill Name: dp-ops-advisor Version: 2.0.1 The dp-ops-advisor skill is a monitoring and diagnostic tool for a data processing platform. It uses standard REST API calls via curl to fetch job statuses and metrics from a user-configured server URL using an API key. The logic in SKILL.md and skill.json is consistent with its stated purpose of operational support, including health checks and stall detection, and contains no evidence of data exfiltration, malicious execution, or prompt injection.

能力评估

ℹ Purpose & Capability

The described purpose (monitoring/diagnosing a DP platform) legitimately requires DP_SERVER_URL and DP_API_KEY, and the SKILL.md uses those vars exclusively. However, the top-level registry metadata in the submission claimed no required env vars while skill.json and SKILL.md declare two required env vars — this mismatch is an incoherence in the package metadata. SKILL.md also lists context files (dp-api-reference.md, dp-operator-catalog.json) that are not present in the bundle, which reduces reproducibility and is unexpected.

✓ Instruction Scope

Instructions direct the agent to call DP_SERVER_URL endpoints with the DP_API_KEY header, parse responses locally, and optionally take corrective actions — all consistent with the stated purpose. The script prints the first 8 characters of the API key to stdout (partial secret exposure) and uses curl/python inline, but it does not instruct reading unrelated system files or contacting external endpoints beyond the DP server.

✓ Install Mechanism

No install spec or code is included (instruction-only). This is low-risk from an install perspective since no remote code is downloaded or written to disk by an installer.

⚠ Credentials

The skill requires only DP_SERVER_URL and DP_API_KEY which are proportionate to the functionality. The concern is the metadata mismatch: registry-level fields reported 'none' while skill.json and SKILL.md require credentials. Also, the instructions echo a substring of DP_API_KEY to logs/UI which risks secret exposure; users should ensure the key has least-privilege and rate limits. No other unrelated credentials are requested.

✓ Persistence & Privilege

The skill does not request persistent/always-on privileges (always:false) and does not attempt to modify other skills or system-wide settings. Autonomous invocation is allowed (platform default) but is not combined with other high-risk flags here.

版本历史

v2.0.1

- No changes detected in files for version 2.0.1. - Skill description updated to "Test Skill" with a brief statement. - All previous documentation and detailed instructions have been replaced.

v2.0.0

dp-ops-advisor 2.0.0 introduces major operational and authentication enhancements. - Enforces mandatory API Key authentication; forbids session-based login for increased security. - Adds comprehensive onboarding instructions for first-time users, including API Key setup steps. - Defines strict API usage quotas (free: 100 calls/month, upgrade options available). - Expands capabilities: real-time job health checks, throughput & latency analysis, auto-restart, incident reporting, configuration recommendations. - Updated workflow documentation for all key operational support scenarios.

元数据

Slug dp-ops-advisor

版本 2.0.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

AI-powered DP Platform Operations Advisor 是什么？

DP 数据处理平台运维顾问。当用户提到检查平台、作业失败、作业状态、吞吐量分析、故障诊断、运维报告等运维需求时激活。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 109 次。

如何安装 AI-powered DP Platform Operations Advisor？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install dp-ops-advisor」即可一键安装，无需额外配置。

AI-powered DP Platform Operations Advisor 是免费的吗？

是的，AI-powered DP Platform Operations Advisor 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

AI-powered DP Platform Operations Advisor 支持哪些平台？

AI-powered DP Platform Operations Advisor 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 AI-powered DP Platform Operations Advisor？

由 Hxp365（@hxp365）开发并维护，当前版本 v2.0.1。

AI-powered DP Platform Operations Advisor