功能描述

Linux 系统上开源大数据应用分析诊断工具。支持服务状态检查、参数配置获取和调优、任务报错分析并提供优化建议。支持 YARN、Hive、HDFS、Spark、Impala、Trino、Tez、StarRocks、HBase、Kafka、ZooKeeper、Ranger、OpenLDAP、Hue、Flink。

使用说明 (SKILL.md)

大数据应用分析诊断工具

Name: EMR Analyzer
Author: qinyafei123

本技能提供 Linux 系统上开源大数据应用的分析诊断能力，包括服务状态确认、参数配置获取和调优、任务报错分析并提供优化建议。

支持的服务

服务	进程名	默认端口	配置文件路径	日志路径
YARN	ResourceManager, NodeManager	8088, 8042	/etc/taihao-apps/hadoop-conf/yarn-site.xml	/var/log/hadoop-yarn/
HDFS	NameNode, DataNode	9870, 9864	/etc/taihao-apps/hadoop-conf/hdfs-site.xml	/var/log/hadoop-hdfs/
Hive	HiveServer2, Metastore	10000, 9083	/etc/taihao-apps/hive-conf/hive-site.xml	/var/log/hive/
Spark	Spark Master, Worker	8080, 8081	/etc/taihao-apps/spark-conf/spark-env.sh	/var/log/spark/
Impala	impalad, catalogd, statestored	21050, 25010, 25020	/etc/taihao-apps/impala-conf/impala-site.xml	/var/log/impala/
Trino	trino-server	8080	config.properties	/var/log/trino/
Tez	(运行在 YARN 上)	-	tez-site.xml	/var/log/hadoop-yarn/
StarRocks	fe, be	8030, 9030	/opt/starrocks/fe/conf/fe.conf	/opt/starrocks/log/
HBase	HMaster, RegionServer	16010, 16030	/etc/taihao-apps/hbase-conf/hbase-site.xml	/var/log/hbase/
Kafka	kafka.Kafka	9092	/etc/taihao-apps/kafka-conf/server.properties	/var/log/kafka/
ZooKeeper	QuorumPeerMain	2181	/etc/taihao-apps/zookeeper-conf/zoo.cfg	/var/log/zookeeper/
Ranger	ranger-admin, ranger-usersync	6080, 6180	ranger-admin-site.xml	/var/log/ranger/
OpenLDAP	slapd	389, 636	slapd.conf	/var/log/slapd.log
Hue	runserver	8888	hue.ini	/var/log/hue/
Flink	StandaloneClusterEntrypoint, TaskManager	8081	/etc/taihao-apps/flink-conf/flink-conf.yaml	/var/log/flink/

配置文件路径说明: 优先查找 /etc/taihao-apps/\x3Cservice>-conf/，如果不存在则回退到 /etc/ecm/\x3Cservice>-conf/

使用方法

1. 检查集群整体状态

用户输入: "帮我分析下这个集群的服务整体情况"

执行流程:

调用 check_service_status.py all 获取所有服务状态
汇总 RUNNING/NOT_RUNNING 状态
输出服务健康报告

python3 {baseDir}/scripts/check_service_status.py all

2. 检查单个服务状态

用户输入: "YARN 服务正常吗？" / "检查 HDFS 状态"

python3 {baseDir}/scripts/check_service_status.py \x3Cservice_name>

3. 分析报错日志

用户输入:

"帮我分析下这个报错：[错误日志内容]"
"Hive 查询失败了，看看日志"

执行流程:

识别报错所属服务
调用 analyze_logs.py \x3Cservice> 分析日志
匹配错误模式并提供修复建议

python3 {baseDir}/scripts/analyze_logs.py \x3Cservice_name> [lines]

4. 获取配置参数

用户输入:

"查看 Spark 的配置参数"
"YARN 的内存配置是多少"

python3 {baseDir}/scripts/check_config.py \x3Cservice_name>

5. 获取调优建议

用户输入:

"如何优化 Hive 查询性能"
"Kafka 配置怎么调优"

执行流程:

调用 check_config.py 获取当前配置
根据 KEY_PARAMS 和调优规则生成建议
输出优化建议

脚本说明

check_service_status.py

检查服务进程、端口监听、systemd 服务状态。

输出示例:

{
  "name": "yarn",
  "status": "RUNNING",
  "processes": [
    {"name": "ResourceManager", "running": true, "details": "..."},
    {"name": "NodeManager", "running": true, "details": "..."}
  ],
  "ports": [
    {"port": 8088, "listening": true, "details": "..."}
  ],
  "systemd": "active",
  "version": "Hadoop 3.3.6"
}

analyze_logs.py

分析服务日志，识别错误模式并提供修复建议。

识别的错误类型:

内存问题 (OOM, GC overhead)
连接问题 (Connection refused/timeout)
权限问题 (Permission denied)
磁盘问题 (No space left)
超时问题 (TimeoutException)
服务特定错误 (DataNode unavailable, Container killed 等)

输出示例:

{
  "service": "hive",
  "status": "ANALYZED",
  "error_summary": {
    "OOM - 内存不足": 5,
    "连接超时": 2
  },
  "suggestions": {
    "OOM - 内存不足": ["增加 JVM 堆内存", "检查内存泄漏", ...]
  },
  "recommendations": ["优先处理内存问题 - 检查并调整 JVM 堆内存配置"]
}

check_config.py

获取服务配置参数并提供调优建议。

输出示例:

{
  "service": "spark",
  "config_file": "/etc/ecm/spark-conf/spark-env.sh",
  "config_exists": true,
  "params": {
    "spark.executor.memory": "4g",
    "spark.sql.shuffle.partitions": "200"
  },
  "suggestions": [
    {
      "param": "spark.executor.memory",
      "current": "4g",
      "suggestion": "根据数据量调整，一般 4g-8g 起步",
      "reason": "Executor 内存影响任务处理能力"
    }
  ]
}

工作流程

场景 1: 集群健康检查

用户：帮我分析下这个集群的服务整体情况
  ↓
1. 执行 check_service_status.py all
  ↓
2. 汇总各服务状态 (RUNNING/NOT_RUNNING)
  ↓
3. 识别异常服务
  ↓
4. 对异常服务执行 analyze_logs.py
  ↓
5. 输出健康报告 + 问题诊断 + 修复建议

场景 2: 报错分析

用户：[粘贴报错日志]
  ↓
1. 识别报错所属服务 (关键词匹配)
  ↓
2. 执行 analyze_logs.py \x3Cservice>
  ↓
3. 匹配错误模式
  ↓
4. 从 FIX_SUGGESTIONS 获取修复建议
  ↓
5. 输出错误分析 + 解决方案

场景 3: 性能调优

用户：如何优化 Spark 查询性能
  ↓
1. 执行 check_config.py spark
  ↓
2. 获取当前关键配置参数
  ↓
3. 根据调优规则生成建议
  ↓
4. 输出配置现状 + 优化建议 + 预期效果

日志路径参考

配置文件路径

优先: /etc/taihao-apps/\x3Cservice>-conf/
回退: /etc/ecm/\x3Cservice>-conf/ (如果 taihao-apps 不存在)

日志路径

Hadoop 生态: /var/log/hadoop-\x3Cservice>/
Hive: /var/log/hive/
Spark: /var/log/spark/
其他服务: 详见 references/services.md

常见错误及解决方案

YARN

错误	原因	解决方案
Container killed by AM	超时/OOM	增加 executor 内存，检查任务耗时
NodeManager unavailable	进程挂掉/网络	检查 NM 进程，查看 NM 日志
Queue full capacity	队列资源满	等待任务完成或增加队列容量

HDFS

错误	原因	解决方案
SafeMode	磁盘满/块损坏	清理空间，运行 fsck
DataNode unavailable	DN 失联	检查 DN 进程和网络
Missing blocks	数据块丢失	从备份恢复，检查副本数

Spark

错误	原因	解决方案
ExecutorLostFailure	Executor 挂掉	检查 OOM，增加内存
OutOfMemoryError	内存不足	调大 -Xmx，减少数据量
Task not serializable	序列化问题	检查闭包变量

Hive

错误	原因	解决方案
Metastore connection failed	DB 连接问题	检查 MySQL 服务，连接串
OutOfMemoryError	内存不足	增加 Container 大小
Permission denied	权限问题	检查 HDFS/Ranger 权限

Kafka

错误	原因	解决方案
NotLeaderForPartition	Leader 选举	等待选举完成
TimeoutException	超时	增加 timeout 配置
OutOfMemoryError	内存不足	增加 heap 大小

外部文档

注意事项

权限要求: 需要 SSH 访问目标服务器，建议有 sudo 权限
日志轮转: 旧日志可能被压缩，需要 zcat/gunzip 查看
配置变更: 修改配置后需要重启服务才能生效
生产环境: 调优前建议在测试环境验证

大数据应用分析诊断工具 - 让集群运维更高效 🔍

安全使用建议

This skill appears to do what its description says (reading logs, checking services, and suggesting tuning). Before installing: 1) Review the three included Python scripts yourself — they execute many shell commands and interpolate arguments into shell strings (shell=True), which can enable command injection if untrusted inputs are passed. 2) Only run this skill in a trusted environment (or a sandbox) with limited attacker-exposed inputs. 3) If you plan to use it in production, consider hardening: sanitize/validate service_name and numeric parameters, avoid shell=True (use list args), and audit any commands that connect to local services (e.g., MySQL). 4) Confirm you are comfortable with the skill reading system config files and /var/log/* on the host. If you cannot review or harden the scripts, treat the skill as risky and avoid installing it on sensitive infrastructure.

功能分析

Type: OpenClaw Skill Name: emr-analyer Version: 1.0.0 The skill bundle provides diagnostic capabilities for big data clusters but contains significant security vulnerabilities. The scripts `analyze_logs.py`, `check_config.py`, and `check_service_status.py` extensively use `subprocess.run(shell=True)` to execute system commands (e.g., grep, tail, ps) with arguments derived from the AI agent's input. While some scripts implement basic whitelisting for service names, the reliance on shell execution for processing logs and configurations presents a high risk of command injection if the agent is manipulated via prompt injection. No evidence of intentional malice, such as data exfiltration or backdoors, was detected.

能力评估

✓ Purpose & Capability

Name/description claim a local Linux EMR/BigData diagnostics tool. The included scripts read logs, check processes/ports, and parse configs for many Hadoop-ecosystem services — this is coherent with the stated purpose. Config/log paths under /etc/taihao-apps and /etc/ecm are nonstandard but reasonable for a packaged environment.

ℹ Instruction Scope

SKILL.md instructs running the three included Python scripts to inspect service status, config, and logs. Those scripts perform local file reads and execute system commands (ps/ss/netstat/ls/grep/tail/awk/systemctl/version commands) which are expected for diagnostics. However the scripts accept user-supplied service names/line counts and interpolate them into shell commands executed via subprocess.run(..., shell=True) without explicit sanitization, creating a possibility for shell injection or unintended command execution if untrusted inputs are passed.

✓ Install Mechanism

No install spec or external downloads — the skill is instruction-only with bundled scripts. Nothing is fetched from external URLs or installed automatically, which reduces supply-chain risk. The included code will be written to disk when the skill is installed/executed by the agent.

ℹ Credentials

The skill requests no credentials or environment variables and does not call external endpoints. It does, however, read system config files and logs (e.g., /etc/... and /var/log/...) and attempts local service connections (e.g., invoking mysql -h 127.0.0.1 -P 8030 -u root -e ...) — these are consistent with a local diagnostics tool but grant the skill broad read access to system logs/configs. No network exfiltration endpoints are declared.

✓ Persistence & Privilege

always is false and the skill does not request persistent elevated platform privileges or attempt to modify other skills. It runs on demand via the included scripts; autonomous invocation is allowed by default but not unusual for skills.

版本历史

v1.0.0

emr-analyer 1.0.0 初始版本上线，支持一站式大数据服务分析与诊断： - 提供 YARN、Hive、HDFS、Spark、Impala、Trino、Tez、StarRocks、HBase、Kafka、ZooKeeper、Ranger、OpenLDAP、Hue、Flink 等主流组件分析能力 - 支持服务状态检查、参数配置获取与调优、日志报错分析及优化建议自动输出 - 提供一键集群健康报告、单服务自诊断与常见故障修复建议 - 脚本化操作流程，输出规范化 JSON 响应，便于快速集成和自动化运维

元数据

Slug emr-analyer

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

EMR Analyzer 是什么？

Linux 系统上开源大数据应用分析诊断工具。支持服务状态检查、参数配置获取和调优、任务报错分析并提供优化建议。支持 YARN、Hive、HDFS、Spark、Impala、Trino、Tez、StarRocks、HBase、Kafka、ZooKeeper、Ranger、OpenLDAP、Hue、Flink。它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 73 次。

如何安装 EMR Analyzer？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install emr-analyer」即可一键安装，无需额外配置。

EMR Analyzer 是免费的吗？

是的，EMR Analyzer 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

EMR Analyzer 支持哪些平台？

EMR Analyzer 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 EMR Analyzer？

由 qinyafei（@qinyafei123）开发并维护，当前版本 v1.0.0。

EMR Analyzer

大数据应用分析诊断工具

支持的服务

使用方法

1. 检查集群整体状态

2. 检查单个服务状态

3. 分析报错日志

4. 获取配置参数

5. 获取调优建议

脚本说明

check_service_status.py

analyze_logs.py

check_config.py

工作流程

场景 1: 集群健康检查

场景 2: 报错分析

场景 3: 性能调优

日志路径参考

配置文件路径

日志路径

常见错误及解决方案

YARN

HDFS

Spark

Hive

Kafka

外部文档

注意事项

EMR Analyzer 是什么？

如何安装 EMR Analyzer？

EMR Analyzer 是免费的吗？

EMR Analyzer 支持哪些平台？

谁开发了 EMR Analyzer？

💬 留言讨论