← 返回 Skills 市场
qinyafei123

EMR Analyzer

作者 qinyafei · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
73
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install emr-analyer
功能描述
Linux 系统上开源大数据应用分析诊断工具。支持服务状态检查、参数配置获取和调优、任务报错分析并提供优化建议。支持 YARN、Hive、HDFS、Spark、Impala、Trino、Tez、StarRocks、HBase、Kafka、ZooKeeper、Ranger、OpenLDAP、Hue、Flink。
使用说明 (SKILL.md)

大数据应用分析诊断工具

本技能提供 Linux 系统上开源大数据应用的分析诊断能力,包括服务状态确认、参数配置获取和调优、任务报错分析并提供优化建议。


支持的服务

服务 进程名 默认端口 配置文件路径 日志路径
YARN ResourceManager, NodeManager 8088, 8042 /etc/taihao-apps/hadoop-conf/yarn-site.xml /var/log/hadoop-yarn/
HDFS NameNode, DataNode 9870, 9864 /etc/taihao-apps/hadoop-conf/hdfs-site.xml /var/log/hadoop-hdfs/
Hive HiveServer2, Metastore 10000, 9083 /etc/taihao-apps/hive-conf/hive-site.xml /var/log/hive/
Spark Spark Master, Worker 8080, 8081 /etc/taihao-apps/spark-conf/spark-env.sh /var/log/spark/
Impala impalad, catalogd, statestored 21050, 25010, 25020 /etc/taihao-apps/impala-conf/impala-site.xml /var/log/impala/
Trino trino-server 8080 config.properties /var/log/trino/
Tez (运行在 YARN 上) - tez-site.xml /var/log/hadoop-yarn/
StarRocks fe, be 8030, 9030 /opt/starrocks/fe/conf/fe.conf /opt/starrocks/log/
HBase HMaster, RegionServer 16010, 16030 /etc/taihao-apps/hbase-conf/hbase-site.xml /var/log/hbase/
Kafka kafka.Kafka 9092 /etc/taihao-apps/kafka-conf/server.properties /var/log/kafka/
ZooKeeper QuorumPeerMain 2181 /etc/taihao-apps/zookeeper-conf/zoo.cfg /var/log/zookeeper/
Ranger ranger-admin, ranger-usersync 6080, 6180 ranger-admin-site.xml /var/log/ranger/
OpenLDAP slapd 389, 636 slapd.conf /var/log/slapd.log
Hue runserver 8888 hue.ini /var/log/hue/
Flink StandaloneClusterEntrypoint, TaskManager 8081 /etc/taihao-apps/flink-conf/flink-conf.yaml /var/log/flink/

配置文件路径说明: 优先查找 /etc/taihao-apps/\x3Cservice>-conf/,如果不存在则回退到 /etc/ecm/\x3Cservice>-conf/


使用方法

1. 检查集群整体状态

用户输入: "帮我分析下这个集群的服务整体情况"

执行流程:

  1. 调用 check_service_status.py all 获取所有服务状态
  2. 汇总 RUNNING/NOT_RUNNING 状态
  3. 输出服务健康报告
python3 {baseDir}/scripts/check_service_status.py all

2. 检查单个服务状态

用户输入: "YARN 服务正常吗?" / "检查 HDFS 状态"

python3 {baseDir}/scripts/check_service_status.py \x3Cservice_name>

3. 分析报错日志

用户输入:

  • "帮我分析下这个报错:[错误日志内容]"
  • "Hive 查询失败了,看看日志"

执行流程:

  1. 识别报错所属服务
  2. 调用 analyze_logs.py \x3Cservice> 分析日志
  3. 匹配错误模式并提供修复建议
python3 {baseDir}/scripts/analyze_logs.py \x3Cservice_name> [lines]

4. 获取配置参数

用户输入:

  • "查看 Spark 的配置参数"
  • "YARN 的内存配置是多少"
python3 {baseDir}/scripts/check_config.py \x3Cservice_name>

5. 获取调优建议

用户输入:

  • "如何优化 Hive 查询性能"
  • "Kafka 配置怎么调优"

执行流程:

  1. 调用 check_config.py 获取当前配置
  2. 根据 KEY_PARAMS 和调优规则生成建议
  3. 输出优化建议

脚本说明

check_service_status.py

检查服务进程、端口监听、systemd 服务状态。

输出示例:

{
  "name": "yarn",
  "status": "RUNNING",
  "processes": [
    {"name": "ResourceManager", "running": true, "details": "..."},
    {"name": "NodeManager", "running": true, "details": "..."}
  ],
  "ports": [
    {"port": 8088, "listening": true, "details": "..."}
  ],
  "systemd": "active",
  "version": "Hadoop 3.3.6"
}

analyze_logs.py

分析服务日志,识别错误模式并提供修复建议。

识别的错误类型:

  • 内存问题 (OOM, GC overhead)
  • 连接问题 (Connection refused/timeout)
  • 权限问题 (Permission denied)
  • 磁盘问题 (No space left)
  • 超时问题 (TimeoutException)
  • 服务特定错误 (DataNode unavailable, Container killed 等)

输出示例:

{
  "service": "hive",
  "status": "ANALYZED",
  "error_summary": {
    "OOM - 内存不足": 5,
    "连接超时": 2
  },
  "suggestions": {
    "OOM - 内存不足": ["增加 JVM 堆内存", "检查内存泄漏", ...]
  },
  "recommendations": ["优先处理内存问题 - 检查并调整 JVM 堆内存配置"]
}

check_config.py

获取服务配置参数并提供调优建议。

输出示例:

{
  "service": "spark",
  "config_file": "/etc/ecm/spark-conf/spark-env.sh",
  "config_exists": true,
  "params": {
    "spark.executor.memory": "4g",
    "spark.sql.shuffle.partitions": "200"
  },
  "suggestions": [
    {
      "param": "spark.executor.memory",
      "current": "4g",
      "suggestion": "根据数据量调整,一般 4g-8g 起步",
      "reason": "Executor 内存影响任务处理能力"
    }
  ]
}

工作流程

场景 1: 集群健康检查

用户:帮我分析下这个集群的服务整体情况
  ↓
1. 执行 check_service_status.py all
  ↓
2. 汇总各服务状态 (RUNNING/NOT_RUNNING)
  ↓
3. 识别异常服务
  ↓
4. 对异常服务执行 analyze_logs.py
  ↓
5. 输出健康报告 + 问题诊断 + 修复建议

场景 2: 报错分析

用户:[粘贴报错日志]
  ↓
1. 识别报错所属服务 (关键词匹配)
  ↓
2. 执行 analyze_logs.py \x3Cservice>
  ↓
3. 匹配错误模式
  ↓
4. 从 FIX_SUGGESTIONS 获取修复建议
  ↓
5. 输出错误分析 + 解决方案

场景 3: 性能调优

用户:如何优化 Spark 查询性能
  ↓
1. 执行 check_config.py spark
  ↓
2. 获取当前关键配置参数
  ↓
3. 根据调优规则生成建议
  ↓
4. 输出配置现状 + 优化建议 + 预期效果

日志路径参考

配置文件路径

  • 优先: /etc/taihao-apps/\x3Cservice>-conf/
  • 回退: /etc/ecm/\x3Cservice>-conf/ (如果 taihao-apps 不存在)

日志路径

  • Hadoop 生态: /var/log/hadoop-\x3Cservice>/
  • Hive: /var/log/hive/
  • Spark: /var/log/spark/
  • 其他服务: 详见 references/services.md

常见错误及解决方案

YARN

错误 原因 解决方案
Container killed by AM 超时/OOM 增加 executor 内存,检查任务耗时
NodeManager unavailable 进程挂掉/网络 检查 NM 进程,查看 NM 日志
Queue full capacity 队列资源满 等待任务完成或增加队列容量

HDFS

错误 原因 解决方案
SafeMode 磁盘满/块损坏 清理空间,运行 fsck
DataNode unavailable DN 失联 检查 DN 进程和网络
Missing blocks 数据块丢失 从备份恢复,检查副本数

Spark

错误 原因 解决方案
ExecutorLostFailure Executor 挂掉 检查 OOM,增加内存
OutOfMemoryError 内存不足 调大 -Xmx,减少数据量
Task not serializable 序列化问题 检查闭包变量

Hive

错误 原因 解决方案
Metastore connection failed DB 连接问题 检查 MySQL 服务,连接串
OutOfMemoryError 内存不足 增加 Container 大小
Permission denied 权限问题 检查 HDFS/Ranger 权限

Kafka

错误 原因 解决方案
NotLeaderForPartition Leader 选举 等待选举完成
TimeoutException 超时 增加 timeout 配置
OutOfMemoryError 内存不足 增加 heap 大小

外部文档


注意事项

  1. 权限要求: 需要 SSH 访问目标服务器,建议有 sudo 权限
  2. 日志轮转: 旧日志可能被压缩,需要 zcat/gunzip 查看
  3. 配置变更: 修改配置后需要重启服务才能生效
  4. 生产环境: 调优前建议在测试环境验证

大数据应用分析诊断工具 - 让集群运维更高效 🔍

安全使用建议
This skill appears to do what its description says (reading logs, checking services, and suggesting tuning). Before installing: 1) Review the three included Python scripts yourself — they execute many shell commands and interpolate arguments into shell strings (shell=True), which can enable command injection if untrusted inputs are passed. 2) Only run this skill in a trusted environment (or a sandbox) with limited attacker-exposed inputs. 3) If you plan to use it in production, consider hardening: sanitize/validate service_name and numeric parameters, avoid shell=True (use list args), and audit any commands that connect to local services (e.g., MySQL). 4) Confirm you are comfortable with the skill reading system config files and /var/log/* on the host. If you cannot review or harden the scripts, treat the skill as risky and avoid installing it on sensitive infrastructure.
功能分析
Type: OpenClaw Skill Name: emr-analyer Version: 1.0.0 The skill bundle provides diagnostic capabilities for big data clusters but contains significant security vulnerabilities. The scripts `analyze_logs.py`, `check_config.py`, and `check_service_status.py` extensively use `subprocess.run(shell=True)` to execute system commands (e.g., grep, tail, ps) with arguments derived from the AI agent's input. While some scripts implement basic whitelisting for service names, the reliance on shell execution for processing logs and configurations presents a high risk of command injection if the agent is manipulated via prompt injection. No evidence of intentional malice, such as data exfiltration or backdoors, was detected.
能力评估
Purpose & Capability
Name/description claim a local Linux EMR/BigData diagnostics tool. The included scripts read logs, check processes/ports, and parse configs for many Hadoop-ecosystem services — this is coherent with the stated purpose. Config/log paths under /etc/taihao-apps and /etc/ecm are nonstandard but reasonable for a packaged environment.
Instruction Scope
SKILL.md instructs running the three included Python scripts to inspect service status, config, and logs. Those scripts perform local file reads and execute system commands (ps/ss/netstat/ls/grep/tail/awk/systemctl/version commands) which are expected for diagnostics. However the scripts accept user-supplied service names/line counts and interpolate them into shell commands executed via subprocess.run(..., shell=True) without explicit sanitization, creating a possibility for shell injection or unintended command execution if untrusted inputs are passed.
Install Mechanism
No install spec or external downloads — the skill is instruction-only with bundled scripts. Nothing is fetched from external URLs or installed automatically, which reduces supply-chain risk. The included code will be written to disk when the skill is installed/executed by the agent.
Credentials
The skill requests no credentials or environment variables and does not call external endpoints. It does, however, read system config files and logs (e.g., /etc/... and /var/log/...) and attempts local service connections (e.g., invoking mysql -h 127.0.0.1 -P 8030 -u root -e ...) — these are consistent with a local diagnostics tool but grant the skill broad read access to system logs/configs. No network exfiltration endpoints are declared.
Persistence & Privilege
always is false and the skill does not request persistent elevated platform privileges or attempt to modify other skills. It runs on demand via the included scripts; autonomous invocation is allowed by default but not unusual for skills.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install emr-analyer
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /emr-analyer 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
emr-analyer 1.0.0 初始版本上线,支持一站式大数据服务分析与诊断: - 提供 YARN、Hive、HDFS、Spark、Impala、Trino、Tez、StarRocks、HBase、Kafka、ZooKeeper、Ranger、OpenLDAP、Hue、Flink 等主流组件分析能力 - 支持服务状态检查、参数配置获取与调优、日志报错分析及优化建议自动输出 - 提供一键集群健康报告、单服务自诊断与常见故障修复建议 - 脚本化操作流程,输出规范化 JSON 响应,便于快速集成和自动化运维
元数据
Slug emr-analyer
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

EMR Analyzer 是什么?

Linux 系统上开源大数据应用分析诊断工具。支持服务状态检查、参数配置获取和调优、任务报错分析并提供优化建议。支持 YARN、Hive、HDFS、Spark、Impala、Trino、Tez、StarRocks、HBase、Kafka、ZooKeeper、Ranger、OpenLDAP、Hue、Flink。 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 73 次。

如何安装 EMR Analyzer?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install emr-analyer」即可一键安装,无需额外配置。

EMR Analyzer 是免费的吗?

是的,EMR Analyzer 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

EMR Analyzer 支持哪些平台?

EMR Analyzer 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 EMR Analyzer?

由 qinyafei(@qinyafei123)开发并维护,当前版本 v1.0.0。

💬 留言讨论