← Back to Skills Marketplace
ljw-git-dw

dataworks-diagnoser

by ljw-git-dw · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
72
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install dataworks-diagnoser
Description
Fetch and analyze Alibaba Cloud DataWorks task instance logs to diagnose failures and get actionable recommendations using your instance ID and credentials.
README (SKILL.md)

DataWorks Task Instance Diagnostician

Fetches task instance logs from Alibaba Cloud DataWorks API and provides intelligent diagnostic recommendations.

Quick Start

Diagnose a failed task:

python3 scripts/dataworks_diagnose.py \x3Cinstance_id>

Example:

python3 scripts/dataworks_diagnose.py 123456789

When to Use

USE this skill when:

  • DataWorks task instance failed and you need to know why
  • You have an instance ID and need to fetch error logs
  • You want automated diagnosis and solutions for task failures
  • Troubleshooting ODPS SQL, Data Integration, Shell, Python nodes
  • Need to analyze error patterns across multiple failures
  • Preparing incident reports for failed tasks

When NOT to Use

DON'T use this skill when:

  • You need real-time task monitoring (use DataWorks console)
  • You want to modify task configurations (use console or API directly)
  • You need historical analytics across many tasks (use DataWorks reports)
  • The task is still running (wait for completion first)
  • You don't have Alibaba Cloud credentials (need AccessKey)

Prerequisites

1. Alibaba Cloud Credentials

One of the following is required:

Option A: Environment Variables (Recommended)

export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_secret

Option B: Config File Create ~/.alibabacloud/credentials:

{
  "access_key_id": "your_access_key",
  "access_key_secret": "your_access_secret"
}

Option C: Aliyun CLI Config If you have Aliyun CLI configured, credentials will be loaded automatically.

2. Required Permissions

The AccessKey needs these permissions:

  • dataworks:GetInstanceLog - Fetch task instance logs
  • dataworks:QueryTask - Query task information

3. Network Access

  • Access to Alibaba Cloud API endpoints
  • If using VPC, ensure proper network configuration

Core Workflows

1. Quick Diagnosis (Recommended)

Fetch log and get diagnosis in one command:

python3 scripts/dataworks_diagnose.py \x3Cinstance_id>

Example:

python3 scripts/dataworks_diagnose.py 123456789

Output:

🔍 开始诊断 DataWorks 任务实例:123456789
📍 区域:cn-hangzhou
------------------------------------------------------------

📥 步骤 1/2: 获取任务日志...
✅ 日志获取成功

🔬 步骤 2/2: 分析诊断中...
✅ 诊断完成

============================================================
📋 诊断报告
============================================================
🔍 DataWorks 任务实例诊断报告
============================================================
实例 ID: 123456789
发现问题数:2

----------------------------------------------------------------------
🔴 问题 1: 资源配额不足
   类型:resource_quota
   严重程度:HIGH
   
   相关日志:
     > ERROR: quota exceeded for resource group 'default'
     > No available slots in queue
   
   建议解决方案:
     1. 检查当前资源组的使用情况,释放闲置资源
     2. 联系管理员提升资源配额
     3. 优化任务配置,减少资源消耗
     4. 考虑错峰调度,避开资源使用高峰
   
   参考文档:https://help.aliyun.com/.../resource-group.html

2. Fetch Log Only

python3 scripts/fetch_instance_log.py \x3Cinstance_id> [options]

Options:

# Specify region
python3 scripts/fetch_instance_log.py 123456789 --region cn-shanghai

# Output as JSON
python3 scripts/fetch_instance_log.py 123456789 --json

# Show full log (default: last 50 lines)
python3 scripts/fetch_instance_log.py 123456789 --verbose

# Save to file
python3 scripts/fetch_instance_log.py 123456789 > log.txt

3. Diagnose Existing Log

python3 scripts/diagnose_log.py \x3Clog_file>

Examples:

# From file
python3 scripts/diagnose_log.py error.log

# From stdin
cat log.txt | python3 scripts/diagnose_log.py

# With instance ID
python3 scripts/diagnose_log.py error.log --instance-id 123456789

# JSON output
python3 scripts/diagnose_log.py error.log --json

# Summary only
python3 scripts/diagnose_log.py error.log --summary

Scripts

This skill includes three scripts:

dataworks_diagnose.py - All-in-One Tool

Fetches log and provides diagnosis automatically.

Usage:

python3 scripts/dataworks_diagnose.py \x3Cinstance_id> [options]

Options:

  • --region, -r - Alibaba Cloud region (default: cn-hangzhou)
  • --json, -j - Output as JSON
  • --verbose, -v - Show full log
  • --save-log FILE - Save raw log to file
  • --save-report FILE - Save diagnostic report to file

fetch_instance_log.py - Log Fetcher

Fetches task instance log from DataWorks API.

Usage:

python3 scripts/fetch_instance_log.py \x3Cinstance_id> [options]

Options:

  • --region, -r - Region (default: cn-hangzhou)
  • --access-key - Access Key ID
  • --access-secret - Access Key Secret
  • --json, -j - JSON output
  • --verbose, -v - Full log

diagnose_log.py - Log Analyzer

Analyzes log content and provides diagnostic recommendations.

Usage:

python3 scripts/diagnose_log.py \x3Clog_file_or_stdin> [options]

Options:

  • --instance-id - Task instance ID
  • --json, -j - JSON output
  • --summary, -s - Summary only

Detected Error Patterns

The diagnostician recognizes these error types:

Error Type Severity Examples
🔴 resource_quota High "quota exceeded", "资源不足"
🔴 resource_expired High "expired", "独享资源组已过期", "bill exception"
🔴 connection_timeout High "connection timeout", "network unreachable"
🔴 permission_denied High "permission denied", "access denied"
🟡 syntax_error Medium "syntax error", "parse error"
🟡 table_not_found Medium "table not found", "doesn't exist"
🟡 data_quality Medium "quality check failed"
🔴 memory_overflow High "out of memory", "heap space"
🔴 disk_full High "disk full", "no space left"
🟡 dependency_failed Medium "dependency failed", "upstream failed"
🟡 api_rate_limit Medium "rate limit exceeded"

See references/error_codes.md for detailed error patterns and solutions.

Common Regions

Region Code
华东 1 (杭州) cn-hangzhou
华东 2 (上海) cn-shanghai
华北 1 (青岛) cn-qingdao
华北 2 (北京) cn-beijing
华南 1 (深圳) cn-shenzhen
香港 cn-hongkong
新加坡 ap-southeast-1

API Reference

API: GetTaskInstanceLog
Version: 2024-05-18
Endpoint: https://dataworks-public.{region}.aliyuncs.com/

Request Parameters:

  • InstanceId (required) - Task instance ID
  • RegionId (required) - Region ID

Response:

{
  "Data": {
    "LogContent": "...",
    "InstanceStatus": "FAILED",
    "CycleTime": "2024-01-15 10:30:00"
  },
  "Code": "200"
}

Documentation: https://api.aliyun.com/api/dataworks-public/2024-05-18/GetTaskInstanceLog

Examples

Example 1: Quick Diagnosis

python3 scripts/dataworks_diagnose.py 123456789

Example 2: Save Report

python3 scripts/dataworks_diagnose.py 123456789 --save-report diagnosis.txt

Example 3: Different Region

python3 scripts/dataworks_diagnose.py 123456789 --region cn-shanghai

Example 4: Analyze Saved Log

python3 scripts/diagnose_log.py saved_log.txt --instance-id 123456789

Example 5: Batch Analysis

for id in 123 456 789; do
  python3 scripts/diagnose_log.py --instance-id $id \x3C log_$id.txt
done

Troubleshooting

"Credentials not found"

# Set environment variables
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_key
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_secret

"Instance not found"

  • Verify the instance ID is correct
  • Check if the instance exists in DataWorks console
  • Ensure you're using the correct region

"Permission denied"

  • Verify AccessKey has required permissions
  • Check RAM role configuration
  • Contact administrator for access

"Request timeout"

  • Check network connectivity
  • Try increasing timeout in script
  • Verify API endpoint is accessible

Tips

💡 Pro tips:

  1. Save logs for failed tasks - Use --save-log to keep records
  2. Generate reports - Use --save-report for documentation
  3. Batch processing - Script supports multiple instance IDs
  4. JSON output - Use --json for programmatic processing
  5. Region matters - Always use the correct region for your workspace

Security

⚠️ Important:

  • Never commit AccessKeys to version control
  • Use RAM roles instead of main account keys
  • Rotate keys regularly
  • Use environment variables or secure config files
  • Restrict key permissions to minimum required

References

Usage Guidance
This package is mostly coherent: it fetches DataWorks logs and analyzes them and does require your Alibaba Cloud AccessKey (ALIBABA_CLOUD_ACCESS_KEY_ID and _SECRET). Before installing or running: 1) Confirm the skill's source/provenance (author repo or signed release); 2) Use a RAM subaccount with minimal permissions (only GetTaskInstanceLog / QueryTask) rather than root account keys; 3) Don't keep credentials in the project directory (credentials.json) unless you intend to; prefer environment variables or a secured config; 4) Run in an isolated environment (virtualenv/container) and review/verify the pip packages (alibabacloud_* and aliyun sdk) you will install; 5) Consider rotating/revoking the AccessKey after use and inspect network traffic if you need to ensure keys are only used against Alibaba endpoints. The main concrete issue is the registry metadata omission — ask the publisher to update metadata to declare the required env vars and dependencies before trusting automated installation.
Capability Analysis
Type: OpenClaw Skill Name: dataworks-diagnoser Version: 1.0.0 The DataWorks Diagnoser skill bundle is a legitimate tool designed to fetch and analyze Alibaba Cloud DataWorks task logs. It utilizes official Alibaba Cloud SDKs (alibabacloud_dataworks_public20240518) and follows standard practices for credential management (environment variables and local config files). The code logic in scripts like dataworks_diagnose.py and analyze_error.py is focused on parsing error patterns and providing diagnostic recommendations, with no evidence of data exfiltration, malicious execution, or prompt injection.
Capability Assessment
Purpose & Capability
Functionality (fetch DataWorks logs and analyze errors) matches the code: scripts call Alibaba Cloud DataWorks APIs and perform local log analysis. Asking for ALIBABA_CLOUD_ACCESS_KEY_ID / SECRET is appropriate for this purpose. However the registry metadata claims no required env vars or primary credential while SKILL.md and scripts clearly require credentials and Python SDKs — an inconsistency in declared requirements.
Instruction Scope
Runtime instructions and scripts limit operations to: loading credentials (env, ~/.alibabacloud/credentials or ./credentials.json), calling DataWorks APIs, parsing logs, and printing/saving reports. The SKILL.md does not instruct unrelated file reads or external endpoints beyond Alibaba Cloud. The only notable behavior is the scripts will search for local credential files (including credentials.json in working dir), which is expected but worth noting.
Install Mechanism
There is no platform-level install spec, and the code relies on standard pip packages (Alibaba Cloud SDKs) listed in requirements.txt — reasonable for the task. SKILL.md embeds a small install metadata snippet (suggesting brew curl) and README instructs pip installs. No downloads from arbitrary URLs or obfuscated installers were found. The minor inconsistency between 'no install spec' in registry metadata and install hints inside SKILL.md is worth correcting.
Credentials
The scripts legitimately need Alibaba Cloud AccessKey ID/Secret and network access to DataWorks endpoints. However the registry metadata omits these required environment variables and primary credential declaration. The skill also tries multiple locations for credentials (env, ~/.alibabacloud/credentials, ./credentials.json) — convenient but increases the chance of accidentally using unintended/local credentials. Ensure you supply a least-privilege RAM subaccount key and avoid leaving long-lived secrets in the working directory.
Persistence & Privilege
The skill does not request permanent 'always' presence, does not modify other skills or system-wide config, and only writes files if the user instructs (save-log / save-report). It runs helper scripts as subprocesses locally; no autonomous elevation or hidden persistence was observed.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install dataworks-diagnoser
  3. After installation, invoke the skill by name or use /dataworks-diagnoser
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
初始版本 - DataWorks 任务诊断工具 功能: - 自动获取任务日志 - AI 智能错误分析 - 支持 ODPS、DataX、Java 异常 - 简洁清晰的诊断报告
Metadata
Slug dataworks-diagnoser
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is dataworks-diagnoser?

Fetch and analyze Alibaba Cloud DataWorks task instance logs to diagnose failures and get actionable recommendations using your instance ID and credentials. It is an AI Agent Skill for Claude Code / OpenClaw, with 72 downloads so far.

How do I install dataworks-diagnoser?

Run "/install dataworks-diagnoser" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is dataworks-diagnoser free?

Yes, dataworks-diagnoser is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does dataworks-diagnoser support?

dataworks-diagnoser is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created dataworks-diagnoser?

It is built and maintained by ljw-git-dw (@ljw-git-dw); the current version is v1.0.0.

💬 Comments