← 返回 Skills 市场
olearycrew

PinchBench

作者 olearycrew · GitHub ↗ · v1.0.0
cross-platform ⚠ suspicious
827
总下载
0
收藏
3
当前安装
1
版本数
在 OpenClaw 中安装
/install pinchbench
功能描述
Run PinchBench benchmarks to evaluate OpenClaw agent performance across real-world tasks. Use when testing model capabilities, comparing models, submitting b...
安全使用建议
What to check before installing/using this skill: - Inspect lib_upload.py (and the --register/upload code path) to see exactly what fields are sent to pinchbench.com; do not upload runs containing private data unless you understand what's transmitted. Use --no-upload for local-only testing. - Review the grading code behavior: tasks may contain embedded Python automated checks which the grading engine executes with exec(); only run tasks from trusted sources or review task_*.md files before running. - Be aware the skill reads OpenClaw files in your home (~/.openclaw/agents/*) and can create new agents and workspaces via the openclaw CLI — this may expose agent transcripts or sensitive tool outputs to local processing and (if you upload) to the leaderboard. - The pyproject lists fabric/paramiko (SSH capabilities) not mentioned in the docs — if you install dependencies, consider doing so in an isolated virtualenv or sandbox and review why SSH libraries are needed. - If you want to be cautious: run the scripts in an isolated environment (container or VM), run with --no-upload first, and audit any results JSON for sensitive content before sharing. If you plan to allow uploading, confirm the upload endpoint and privacy policy on pinchbench.com and inspect where tokens are stored by the registration flow.
功能分析
Type: OpenClaw Skill Name: pinchbench Version: 1.0.0 The skill is classified as suspicious primarily due to a critical Remote Code Execution (RCE) vulnerability. The `scripts/lib_grading.py` file uses `exec()` to run Python code directly extracted from the `Automated Checks` sections within task definition markdown files (`tasks/*.md`). This design allows any malicious task submitted to the benchmarking system to execute arbitrary Python code on the host. Additionally, the skill collects system metadata (OS, CPU, Python version, hostname hash) and benchmark results, then uploads them to `api.pinchbench.com`, which, while stated for a public leaderboard, constitutes data exfiltration to an external endpoint. The `pyproject.toml` also lists `fabric` and `paramiko` (SSH libraries) as dependencies, which are powerful capabilities not explicitly used or justified by the current code.
能力评估
Purpose & Capability
The skill's name/description align with the included code: it loads tasks, creates/runs OpenClaw agents, grades runs, and can upload results to a leaderboard. Minor mismatch: pyproject.toml lists dependencies like fabric and paramiko (SSH-related) that are not mentioned in SKILL.md; their presence may be legitimate for some tasks but is not explained in the README or SKILL.md.
Instruction Scope
Runtime code reads OpenClaw agent configuration and session transcripts from the user's home (~/.openclaw/agents/*) and will prepare agent workspaces (possibly writing files into agent workspaces). The grading engine executes automated grading code via exec() extracted from task markdown — this executes arbitrary Python from task files. The skill also supports registering/uploading results to pinchbench.com, which could transmit transcripts or workspace contents to an external server.
Install Mechanism
There is no install spec (instruction-only), so nothing downloads or runs during installation. Code files are included in the skill bundle. The project metadata (pyproject.toml) lists third-party dependencies (pyyaml, fabric, paramiko) that may need to be installed if the user runs the scripts; SKILL.md only mentions 'uv' and Python. No remote download URLs or extract steps were found.
Credentials
The skill declares no required environment variables but reads local OpenClaw state (workspaces, sessions) and may create agents via the openclaw CLI. It can register an API token and upload results to a public leaderboard (pinchbench.com). Requesting no env vars while accessing local agent data and offering an upload path is proportionate to benchmarking, but the lack of explicit warning about what is uploaded (transcripts, workspace files) is a concern.
Persistence & Privilege
always is false and the skill does not request elevated platform privileges. At runtime it will create OpenClaw agents (via the openclaw CLI), create workspaces (in ~/.openclaw or /tmp), and may write a stored token/config when --register is used. Creating agent entries and writing token/config files is coherent with its purpose but users should expect persistent artifacts under their OpenClaw config and /tmp.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install pinchbench
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /pinchbench 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release - 23 real-world benchmark tasks for OpenClaw agents
元数据
Slug pinchbench
版本 1.0.0
许可证
累计安装 3
当前安装数 3
历史版本数 1
常见问题

PinchBench 是什么?

Run PinchBench benchmarks to evaluate OpenClaw agent performance across real-world tasks. Use when testing model capabilities, comparing models, submitting b... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 827 次。

如何安装 PinchBench?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install pinchbench」即可一键安装,无需额外配置。

PinchBench 是免费的吗?

是的,PinchBench 完全免费(开源免费),可自由下载、安装和使用。

PinchBench 支持哪些平台?

PinchBench 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 PinchBench?

由 olearycrew(@olearycrew)开发并维护,当前版本 v1.0.0。

💬 留言讨论