Hle Benchmark Evolver
/install hle-benchmark-evolver
HLE Benchmark Evolver
This skill operationalizes HLE score-driven evolution for OpenClaw.
When to Use
- User asks to improve HLE score (for example target >= 60%).
- User provides question-level benchmark output and wants it converted to reward.
- User wants easy-first curriculum queue and next-focus questions.
- User asks for an immediate benchmark result snapshot.
Inputs
- Benchmark report JSON path (
--report=/abs/path/report.json) - Optional benchmark id (
cais/hledefault)
Workflow
- Validate the report JSON exists and is parseable.
- Ingest report into
capability-evolverbenchmark reward state. - Generate curriculum signals:
benchmark_*curriculum_stage:*focus_subject:*focus_modality:*question_focus:*
- Return a compact result summary for this run.
Run
node skills/hle-benchmark-evolver/run_result.js --report=/absolute/path/hle_report.json
Full automatic loop (starts evolution cycle):
node skills/hle-benchmark-evolver/run_pipeline.js --report=/absolute/path/hle_report.json --cycles=1
If your evaluator can be called from shell, let pipeline generate the report each cycle:
node skills/hle-benchmark-evolver/run_pipeline.js \
--report=/absolute/path/hle_report.json \
--eval_cmd="python /path/to/eval_hle.py --out {{report}}" \
--cycles=3 --interval_ms=2000
If no --report is provided, it defaults to:
skills/capability-evolver/assets/gep/hle_report.template.json
Output Contract
Always print JSON with these fields:
benchmark_idrun_idaccuracyrewardtrendcurriculum_stagequeue_sizefocus_subjectsfocus_modalitiesnext_questions
Notes
- This skill handles reward/curriculum ingestion. It does not directly solve HLE questions.
run_pipeline.jslinks ingestion, evolve, and solidify into one executable loop.
- 确保已安装 OpenClaw(本地或 Docker 部署)
- 在对话框中输入安装命令:
/install hle-benchmark-evolver - 安装完成后,直接呼叫该 Skill 的名称或使用
/hle-benchmark-evolver触发 - 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
Hle Benchmark Evolver 是什么?
Runs HLE-oriented benchmark reward ingestion and curriculum generation for capability-evolver. Use when the user asks to optimize Humanity's Last Exam score,... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 735 次。
如何安装 Hle Benchmark Evolver?
在 OpenClaw 或 Claude Code 对话框中运行命令「/install hle-benchmark-evolver」即可一键安装,无需额外配置。
Hle Benchmark Evolver 是免费的吗?
是的,Hle Benchmark Evolver 完全免费(开源免费),可自由下载、安装和使用。
Hle Benchmark Evolver 支持哪些平台?
Hle Benchmark Evolver 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。
谁开发了 Hle Benchmark Evolver?
由 WANGJUNJIE(@wanng-ide)开发并维护,当前版本 v1.0.0。