← Back to Skills Marketplace

PinchBench

Name: PinchBench
Author: olearycrew

by olearycrew · GitHub ↗ · v1.0.0

cross-platform ⚠ suspicious

827

Downloads

Stars

Active Installs

Versions

Install in OpenClaw

/install pinchbench

Description

Run PinchBench benchmarks to evaluate OpenClaw agent performance across real-world tasks. Use when testing model capabilities, comparing models, submitting b...

Usage Guidance

What to check before installing/using this skill: - Inspect lib_upload.py (and the --register/upload code path) to see exactly what fields are sent to pinchbench.com; do not upload runs containing private data unless you understand what's transmitted. Use --no-upload for local-only testing. - Review the grading code behavior: tasks may contain embedded Python automated checks which the grading engine executes with exec(); only run tasks from trusted sources or review task_*.md files before running. - Be aware the skill reads OpenClaw files in your home (~/.openclaw/agents/*) and can create new agents and workspaces via the openclaw CLI — this may expose agent transcripts or sensitive tool outputs to local processing and (if you upload) to the leaderboard. - The pyproject lists fabric/paramiko (SSH capabilities) not mentioned in the docs — if you install dependencies, consider doing so in an isolated virtualenv or sandbox and review why SSH libraries are needed. - If you want to be cautious: run the scripts in an isolated environment (container or VM), run with --no-upload first, and audit any results JSON for sensitive content before sharing. If you plan to allow uploading, confirm the upload endpoint and privacy policy on pinchbench.com and inspect where tokens are stored by the registration flow.

Capability Analysis

Type: OpenClaw Skill Name: pinchbench Version: 1.0.0 The skill is classified as suspicious primarily due to a critical Remote Code Execution (RCE) vulnerability. The `scripts/lib_grading.py` file uses `exec()` to run Python code directly extracted from the `Automated Checks` sections within task definition markdown files (`tasks/*.md`). This design allows any malicious task submitted to the benchmarking system to execute arbitrary Python code on the host. Additionally, the skill collects system metadata (OS, CPU, Python version, hostname hash) and benchmark results, then uploads them to `api.pinchbench.com`, which, while stated for a public leaderboard, constitutes data exfiltration to an external endpoint. The `pyproject.toml` also lists `fabric` and `paramiko` (SSH libraries) as dependencies, which are powerful capabilities not explicitly used or justified by the current code.

Capability Assessment

ℹ Purpose & Capability

The skill's name/description align with the included code: it loads tasks, creates/runs OpenClaw agents, grades runs, and can upload results to a leaderboard. Minor mismatch: pyproject.toml lists dependencies like fabric and paramiko (SSH-related) that are not mentioned in SKILL.md; their presence may be legitimate for some tasks but is not explained in the README or SKILL.md.

⚠ Instruction Scope

Runtime code reads OpenClaw agent configuration and session transcripts from the user's home (~/.openclaw/agents/*) and will prepare agent workspaces (possibly writing files into agent workspaces). The grading engine executes automated grading code via exec() extracted from task markdown — this executes arbitrary Python from task files. The skill also supports registering/uploading results to pinchbench.com, which could transmit transcripts or workspace contents to an external server.

ℹ Install Mechanism

There is no install spec (instruction-only), so nothing downloads or runs during installation. Code files are included in the skill bundle. The project metadata (pyproject.toml) lists third-party dependencies (pyyaml, fabric, paramiko) that may need to be installed if the user runs the scripts; SKILL.md only mentions 'uv' and Python. No remote download URLs or extract steps were found.

⚠ Credentials

The skill declares no required environment variables but reads local OpenClaw state (workspaces, sessions) and may create agents via the openclaw CLI. It can register an API token and upload results to a public leaderboard (pinchbench.com). Requesting no env vars while accessing local agent data and offering an upload path is proportionate to benchmarking, but the lack of explicit warning about what is uploaded (transcripts, workspace files) is a concern.

ℹ Persistence & Privilege

always is false and the skill does not request elevated platform privileges. At runtime it will create OpenClaw agents (via the openclaw CLI), create workspaces (in ~/.openclaw or /tmp), and may write a stored token/config when --register is used. Creating agent entries and writing token/config files is coherent with its purpose but users should expect persistent artifacts under their OpenClaw config and /tmp.

How to Use

Make sure OpenClaw is installed (local or Docker)
Run the install command in chat: /install pinchbench
After installation, invoke the skill by name or use /pinchbench
Provide required inputs per the skill's parameter spec and get structured output

Version History

v1.0.0

Initial release - 23 real-world benchmark tasks for OpenClaw agents

Metadata

Slug pinchbench

Version 1.0.0

License —

All-time Installs 3

Active Installs 3

Total Versions 1

Frequently Asked Questions

What is PinchBench?

Run PinchBench benchmarks to evaluate OpenClaw agent performance across real-world tasks. Use when testing model capabilities, comparing models, submitting b... It is an AI Agent Skill for Claude Code / OpenClaw, with 827 downloads so far.

How do I install PinchBench?

Run "/install pinchbench" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is PinchBench free?

Yes, PinchBench is completely free (open-source). You can download, install and use it at no cost.

Which platforms does PinchBench support?

PinchBench is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created PinchBench?

It is built and maintained by olearycrew (@olearycrew); the current version is v1.0.0.

More Skills