dataworks-diagnoser
/install dataworks-diagnoser
DataWorks Task Instance Diagnostician
Fetches task instance logs from Alibaba Cloud DataWorks API and provides intelligent diagnostic recommendations.
Quick Start
Diagnose a failed task:
python3 scripts/dataworks_diagnose.py \x3Cinstance_id>
Example:
python3 scripts/dataworks_diagnose.py 123456789
When to Use
✅ USE this skill when:
- DataWorks task instance failed and you need to know why
- You have an instance ID and need to fetch error logs
- You want automated diagnosis and solutions for task failures
- Troubleshooting ODPS SQL, Data Integration, Shell, Python nodes
- Need to analyze error patterns across multiple failures
- Preparing incident reports for failed tasks
When NOT to Use
❌ DON'T use this skill when:
- You need real-time task monitoring (use DataWorks console)
- You want to modify task configurations (use console or API directly)
- You need historical analytics across many tasks (use DataWorks reports)
- The task is still running (wait for completion first)
- You don't have Alibaba Cloud credentials (need AccessKey)
Prerequisites
1. Alibaba Cloud Credentials
One of the following is required:
Option A: Environment Variables (Recommended)
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_secret
Option B: Config File
Create ~/.alibabacloud/credentials:
{
"access_key_id": "your_access_key",
"access_key_secret": "your_access_secret"
}
Option C: Aliyun CLI Config If you have Aliyun CLI configured, credentials will be loaded automatically.
2. Required Permissions
The AccessKey needs these permissions:
dataworks:GetInstanceLog- Fetch task instance logsdataworks:QueryTask- Query task information
3. Network Access
- Access to Alibaba Cloud API endpoints
- If using VPC, ensure proper network configuration
Core Workflows
1. Quick Diagnosis (Recommended)
Fetch log and get diagnosis in one command:
python3 scripts/dataworks_diagnose.py \x3Cinstance_id>
Example:
python3 scripts/dataworks_diagnose.py 123456789
Output:
🔍 开始诊断 DataWorks 任务实例:123456789
📍 区域:cn-hangzhou
------------------------------------------------------------
📥 步骤 1/2: 获取任务日志...
✅ 日志获取成功
🔬 步骤 2/2: 分析诊断中...
✅ 诊断完成
============================================================
📋 诊断报告
============================================================
🔍 DataWorks 任务实例诊断报告
============================================================
实例 ID: 123456789
发现问题数:2
----------------------------------------------------------------------
🔴 问题 1: 资源配额不足
类型:resource_quota
严重程度:HIGH
相关日志:
> ERROR: quota exceeded for resource group 'default'
> No available slots in queue
建议解决方案:
1. 检查当前资源组的使用情况,释放闲置资源
2. 联系管理员提升资源配额
3. 优化任务配置,减少资源消耗
4. 考虑错峰调度,避开资源使用高峰
参考文档:https://help.aliyun.com/.../resource-group.html
2. Fetch Log Only
python3 scripts/fetch_instance_log.py \x3Cinstance_id> [options]
Options:
# Specify region
python3 scripts/fetch_instance_log.py 123456789 --region cn-shanghai
# Output as JSON
python3 scripts/fetch_instance_log.py 123456789 --json
# Show full log (default: last 50 lines)
python3 scripts/fetch_instance_log.py 123456789 --verbose
# Save to file
python3 scripts/fetch_instance_log.py 123456789 > log.txt
3. Diagnose Existing Log
python3 scripts/diagnose_log.py \x3Clog_file>
Examples:
# From file
python3 scripts/diagnose_log.py error.log
# From stdin
cat log.txt | python3 scripts/diagnose_log.py
# With instance ID
python3 scripts/diagnose_log.py error.log --instance-id 123456789
# JSON output
python3 scripts/diagnose_log.py error.log --json
# Summary only
python3 scripts/diagnose_log.py error.log --summary
Scripts
This skill includes three scripts:
dataworks_diagnose.py - All-in-One Tool
Fetches log and provides diagnosis automatically.
Usage:
python3 scripts/dataworks_diagnose.py \x3Cinstance_id> [options]
Options:
--region, -r- Alibaba Cloud region (default: cn-hangzhou)--json, -j- Output as JSON--verbose, -v- Show full log--save-log FILE- Save raw log to file--save-report FILE- Save diagnostic report to file
fetch_instance_log.py - Log Fetcher
Fetches task instance log from DataWorks API.
Usage:
python3 scripts/fetch_instance_log.py \x3Cinstance_id> [options]
Options:
--region, -r- Region (default: cn-hangzhou)--access-key- Access Key ID--access-secret- Access Key Secret--json, -j- JSON output--verbose, -v- Full log
diagnose_log.py - Log Analyzer
Analyzes log content and provides diagnostic recommendations.
Usage:
python3 scripts/diagnose_log.py \x3Clog_file_or_stdin> [options]
Options:
--instance-id- Task instance ID--json, -j- JSON output--summary, -s- Summary only
Detected Error Patterns
The diagnostician recognizes these error types:
| Error Type | Severity | Examples |
|---|---|---|
| 🔴 resource_quota | High | "quota exceeded", "资源不足" |
| 🔴 resource_expired | High | "expired", "独享资源组已过期", "bill exception" |
| 🔴 connection_timeout | High | "connection timeout", "network unreachable" |
| 🔴 permission_denied | High | "permission denied", "access denied" |
| 🟡 syntax_error | Medium | "syntax error", "parse error" |
| 🟡 table_not_found | Medium | "table not found", "doesn't exist" |
| 🟡 data_quality | Medium | "quality check failed" |
| 🔴 memory_overflow | High | "out of memory", "heap space" |
| 🔴 disk_full | High | "disk full", "no space left" |
| 🟡 dependency_failed | Medium | "dependency failed", "upstream failed" |
| 🟡 api_rate_limit | Medium | "rate limit exceeded" |
See references/error_codes.md for detailed error patterns and solutions.
Common Regions
| Region | Code |
|---|---|
| 华东 1 (杭州) | cn-hangzhou |
| 华东 2 (上海) | cn-shanghai |
| 华北 1 (青岛) | cn-qingdao |
| 华北 2 (北京) | cn-beijing |
| 华南 1 (深圳) | cn-shenzhen |
| 香港 | cn-hongkong |
| 新加坡 | ap-southeast-1 |
API Reference
API: GetTaskInstanceLog
Version: 2024-05-18
Endpoint: https://dataworks-public.{region}.aliyuncs.com/
Request Parameters:
InstanceId(required) - Task instance IDRegionId(required) - Region ID
Response:
{
"Data": {
"LogContent": "...",
"InstanceStatus": "FAILED",
"CycleTime": "2024-01-15 10:30:00"
},
"Code": "200"
}
Documentation: https://api.aliyun.com/api/dataworks-public/2024-05-18/GetTaskInstanceLog
Examples
Example 1: Quick Diagnosis
python3 scripts/dataworks_diagnose.py 123456789
Example 2: Save Report
python3 scripts/dataworks_diagnose.py 123456789 --save-report diagnosis.txt
Example 3: Different Region
python3 scripts/dataworks_diagnose.py 123456789 --region cn-shanghai
Example 4: Analyze Saved Log
python3 scripts/diagnose_log.py saved_log.txt --instance-id 123456789
Example 5: Batch Analysis
for id in 123 456 789; do
python3 scripts/diagnose_log.py --instance-id $id \x3C log_$id.txt
done
Troubleshooting
"Credentials not found"
# Set environment variables
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_key
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_secret
"Instance not found"
- Verify the instance ID is correct
- Check if the instance exists in DataWorks console
- Ensure you're using the correct region
"Permission denied"
- Verify AccessKey has required permissions
- Check RAM role configuration
- Contact administrator for access
"Request timeout"
- Check network connectivity
- Try increasing timeout in script
- Verify API endpoint is accessible
Tips
💡 Pro tips:
- Save logs for failed tasks - Use
--save-logto keep records - Generate reports - Use
--save-reportfor documentation - Batch processing - Script supports multiple instance IDs
- JSON output - Use
--jsonfor programmatic processing - Region matters - Always use the correct region for your workspace
Security
⚠️ Important:
- Never commit AccessKeys to version control
- Use RAM roles instead of main account keys
- Rotate keys regularly
- Use environment variables or secure config files
- Restrict key permissions to minimum required
References
references/error_codes.md- Complete error code reference- DataWorks Documentation
- API Reference
- Error Codes
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install dataworks-diagnoser - After installation, invoke the skill by name or use
/dataworks-diagnoser - Provide required inputs per the skill's parameter spec and get structured output
What is dataworks-diagnoser?
Fetch and analyze Alibaba Cloud DataWorks task instance logs to diagnose failures and get actionable recommendations using your instance ID and credentials. It is an AI Agent Skill for Claude Code / OpenClaw, with 72 downloads so far.
How do I install dataworks-diagnoser?
Run "/install dataworks-diagnoser" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is dataworks-diagnoser free?
Yes, dataworks-diagnoser is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does dataworks-diagnoser support?
dataworks-diagnoser is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created dataworks-diagnoser?
It is built and maintained by ljw-git-dw (@ljw-git-dw); the current version is v1.0.0.