功能描述

🦞 GIGO · gigo-lobster-doctor: 环境体检模式：只检查 gateway、Python 依赖、题包链路与 PNG 证书能力，不跑正式试吃。 Triggers: 龙虾体检 / 检查龙虾环境 / lobster doctor / check lobster environment.

使用说明 (SKILL.md)

gigo-lobster-doctor

Name: Gigo Lobster Doctor
Author: gigolab

Mission

环境体检模式：只检查 gateway、Python 依赖、题包链路与 PNG 证书能力，不跑正式试吃。
Environment doctor mode: checks the gateway, Python/runtime dependencies, task-bundle access, and PNG certificate readiness without running the full benchmark.

Trigger Phrases

中文：龙虾体检 / 检查龙虾环境 / 先体检龙虾 / 龙虾环境检查
English: lobster doctor / check lobster environment / lobster environment check / doctor my lobster

Execution Rules

Use a direct Python command on this skill directory's wrapper file. Never use cd ... && python ...; OpenClaw preflight may reject it.
Prefer python3, then python, then py.
If the user asked in Chinese, append --lang zh. If the user asked in English, append --lang en.
Stream short progress updates while the benchmark is running.
Keep stdout/stderr visible and remind the user that the full log is written to gigo-run.log.
Do not run --help, inspect the whole repo, or switch to main.py once the wrapper command is clear. Start the wrapper directly.
If the wrapper starts a long-running process, do not kill it just because stdout is quiet for a while. A full tasting run often takes 15-25 minutes.
While a long run is in progress, monitor the process and tail the log file under ~/.openclaw/workspace/outputs/gigo-lobster-doctor/gigo-run.log instead of improvising a second execution path.
Only declare failure if the process exits non-zero, the log shows a traceback, or the user explicitly asks to cancel.
Stay attached until the wrapper exits. Do not end the conversation with “I will keep monitoring”; keep polling and only report completion once you have the final score/result files/ref_code (if any).
Prefer process poll plus exec tail -n 50 .../gigo-run.log while monitoring. Do not use a generic full-file read on gigo-run.log, because the log can be large and may break the chat output.

Default Behavior

中文：默认只做环境检查，不跑正式 benchmark，也不会上传。
English: By default it only runs the environment checks. No full benchmark and no upload.

Recommended Command Shape

python3 /absolute/path/to/run_doctor.py --lang zh

If the user explicitly asks for overrides, append the matching CLI flags:

--lobster-name "..." and --lobster-tags "tag1,tag2" for a custom lobster persona
--output-dir /custom/path for a custom output directory
--require-png-cert when the user refuses the SVG fallback
--skip-upload or --register-only only when the user explicitly asks to change the default upload behavior

Persona Defaults

Explicit CLI overrides win first: --lobster-name and --lobster-tags
Then read GIGO_LOBSTER_NAME and GIGO_LOBSTER_TAGS
Then read SOUL.md
Finally fall back to the default lobster persona

Do not stop for interactive questions unless the user explicitly asks for an interactive run.

安全使用建议

What to consider before installing/running: 1) Manual review: open the wrapper file run_doctor.py and the top-level scripts (run_doctor.py, run_upload.py, main.py, score_uploader.py, gateway_client.py, judge_client.py) and confirm run_doctor.py only executes local checks and does not call the uploader/judge endpoints. The SKILL.md's instruction to "not inspect the repo" is unusual — do inspect it. 2) Network behavior: search the bundle for requests.post / gateway_base / score_uploader to find code that can POST data. If you will run in an environment with any network access, ensure the gateway_base is a trusted local service (e.g., localhost) before running; consider blocking outbound network from the process (firewall) if you want to be safe. 3) Run in isolation: execute the doctor wrapper in a disposable environment (local VM or container) and with minimal privileges. Prefer running with no GIGO_* env vars set and without CLI flags that enable upload (do not pass --output-dir that maps to shared locations if you want isolation). 4) Validate logs and behavior: run with a dry-run or verbose flag if available, and verify outputs only include checks (dependency tests, PNG capability, task-bundle access). Tail the gigo-run.log yourself and confirm no external endpoints are contacted. 5) If uncertain, decline or ask the author for a short explanation of run_doctor.py's exact steps and a statement of whether it ever calls network endpoints or invokes uploader code. Given the bundle includes full taster/uploader code and the SKILL.md attempts to discourage inspection, treat this package as higher risk until you confirm its doctor mode is truly isolated.

功能分析

Package: (xpi) Version: Description: The package is a benchmarking and diagnostic tool for AI agents (specifically 'Lobster' agents) within the OpenClaw ecosystem. It provides a framework to evaluate agent performance across various dimensions such as coding, reasoning, and safety. Key features include a self-bootstrapping mechanism for runtime dependencies, a remote task-fetching system that uses AES-256-GCM for encrypted bundles, and a 'Shell Shim' security layer designed to intercept and block potentially malicious commands (e.g., root directory deletion, SSH key access) executed by the agents under test. While the tool performs actions like executing subprocesses, downloading remote task payloads, and dynamic code execution of evaluation scripts, these behaviors are integral to its function as a test harness and are implemented with visible security controls.

能力标签

cryptorequires-sensitive-credentials

能力评估

⚠ Purpose & Capability

The SKILL.md and name say this is an environment 'doctor' that only checks gateway, Python deps, task-bundle access, and PNG certificate readiness. The bundle, however, contains the full 'taster' evaluation harness (judge_client, score_uploader, report_generator, run_upload.py, run_resume.py, etc.) capable of running full benchmarks, calling a cloud /judge endpoint, and uploading results. That can be legitimate for a multi-mode package (doctor vs taster), but the presence of full network-capable modules is a broader capability than the one advertised for 'doctor' mode and should be considered when running. Required binaries/envs declared (python) do match the stated purpose.

⚠ Instruction Scope

The SKILL.md instructs the agent to run a wrapper (e.g., python3 /absolute/path/to/run_doctor.py) and explicitly tells the agent not to inspect the repo, not to run --help, and not to switch to main.py once the wrapper command is clear. Those prohibitions limit scrutiny of the bundle and are unusual for a 'doctor' tool that claims not to perform uploads. The instructions also direct specific log tailing under ~/.openclaw/workspace/... and to remain attached until the wrapper exits — operational but prescriptive. Combined with prompt-injection indicators found in SKILL.md, this suggests the instructions may be attempting to steer runtime behavior and avoid manual checks.

ℹ Install Mechanism

No install spec is provided (instruction-only), which avoids arbitrary network downloads at install time. However the skill package itself contains a large code bundle (runner, gateway client, uploader, judge client, etc.). Nothing in the install mechanism pulls external archives, but installing or unpacking the skill will place a substantial codebase on disk which increases local attack surface.

ℹ Credentials

The skill declares no required environment variables or credentials (proportionate for an environment checker). The SKILL.md references optional env vars like GIGO_LOBSTER_NAME, GIGO_UPLOAD_MODE, and GIGO_REQUIRE_PNG_CERT which are non-sensitive. That said, the bundled code includes network-capable components (JudgeClient, score_uploader, gateway_client) that can POST to a /judge endpoint; although no secret or cloud credentials are declared, those network calls could transmit agent outputs if the taster/upload mode is invoked (explicit CLI overrides can enable uploads).

✓ Persistence & Privilege

The skill does not request always:true and does not declare any special system-wide privileges. It runs as a user‑level Python process and its instructions emphasize running a wrapper within the skill directory and monitoring logs. There is no evidence it modifies other skills' configuration or requests persistent privileges beyond typical filesystem I/O for its output directory.

版本历史

v2.1.2

2.1.2: fix leaderboard wording on cert/report so total_entries consistently means ranked entries, not all evaluations.

v2.1.1

2.1.1: smooth full-run cost/speed scoring for real 50-task evaluations and add MiniMax judge retry/fallback.

v2.1.0

2.1.0: run all 50 tasks through cloud judge, tighten speed scoring, and publish richer public diagnostics.

v2.0.19

2.0.19: publish refreshed v2 scoring bundle and recover D1 uploads after slow report responses.

v2.0.18

2.0.18: move judge cache to D1, keep KV config-only, and harden full-run scoring storage.

v2.0.15

2.0.15: harden evaluation/ref APIs, remove default fallback names, and strengthen v2 file-edit prompts.

v2.0.14

2.0.14: polish user-facing share copy and recommended booster labels.

v2.0.13

2.0.13: harden judge/report security and mark recommended skills as gray testing.

v2.0.12

2.0.12: scale speed scoring for full 50-task runs and polish public task diagnosis cards.

v2.0.11

2.0.11: remove model-prefixed public summary text and clarify bundled official task copy wording.

v2.0.10

2.0.10: restore the original PNG certificate design after rejecting the 2.0.9 redesign.

v2.0.9

2.0.9: redesign PNG certificate toward the clean reference layout while preserving the existing QR/link flow.

v2.0.8

2.0.8: add real OpenClaw per-task runner support, isolate eval sessions, expose M2.7 reasoning in unlocked full diagnosis, and wait longer for slow M2.7 judge responses.

v2.0.7

2.0.7: keep M2.7 judge reasoning stored, show a concise overall personalized note, and avoid labeling deterministic report copy as AI-written.

v2.0.6

2.0.6: switch cloud judge to MiniMax-M2.7, store judge reasoning, and show one overall personalized report note instead of per-task AI comments.

v2.0.5

2.0.5: switch cloud judge to MiniMax-M2.7, preserve AI judge reasoning in task reports, and keep OpenClaw identity name fallback.

v2.0.4

2.0.4: fix OpenClaw lobster name detection by falling back to workspace IDENTITY.md when SOUL.md has no explicit name.

v2.0.3

2.0.3: harden leaderboard consistency, v2 report verification, wrapper bootstrap, Gateway env loading, and CJK certificate rendering.

v2.0.2

2.0.2: harden leaderboard consistency, v2 judge score normalization, OpenClaw run logging, and CJK certificate rendering.

v2.0.1

2.0.1: harden v2 judge score normalization and OpenClaw run logging.

元数据

Slug gigo-lobster-doctor

版本 2.1.2

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 25

常见问题