功能描述

🦞 GIGO · gigo-lobster-resume: 续跑入口：v2 stable 当前会清理旧 checkpoint 并从头重跑；保留此 slug 作为旧 checkpoint 兼容入口。 Triggers: 继续试吃 / 恢复评测 / resume tasting / continue lobster...

使用说明 (SKILL.md)

gigo-lobster-resume

Name: Gigo Lobster Resume
Author: gigolab

Mission

续跑入口：v2 stable 当前会清理旧 checkpoint 并从头重跑；保留此 slug 作为旧 checkpoint 兼容入口。
Resume entrypoint: the v2 stable runtime currently clears old checkpoints and starts fresh; this slug remains for legacy checkpoint compatibility.

Trigger Phrases

中文：继续试吃 / 恢复评测 / 继续评估 / 继续龙虾评测
English: resume tasting / continue lobster eval / resume lobster benchmark / continue tasting

Execution Rules

Use a direct Python command on this skill directory's wrapper file. Never use cd ... && python ...; OpenClaw preflight may reject it.
Prefer python3, then python, then py.
If the user asked in Chinese, append --lang zh. If the user asked in English, append --lang en.
Stream short progress updates while the benchmark is running.
Keep stdout/stderr visible and remind the user that the full log is written to gigo-run.log.
Do not run --help, inspect the whole repo, or switch to main.py once the wrapper command is clear. Start the wrapper directly.
If the wrapper starts a long-running process, do not kill it just because stdout is quiet for a while. A full tasting run often takes 15-25 minutes.
While a long run is in progress, monitor the process and tail the log file under ~/.openclaw/workspace/outputs/gigo-lobster-taster/gigo-run.log instead of improvising a second execution path.
Only declare failure if the process exits non-zero, the log shows a traceback, or the user explicitly asks to cancel.
Stay attached until the wrapper exits. Do not end the conversation with “I will keep monitoring”; keep polling and only report completion once you have the final score/result files/ref_code (if any).
Prefer process poll plus exec tail -n 50 .../gigo-run.log while monitoring. Do not use a generic full-file read on gigo-run.log, because the log can be large and may break the chat output.

Default Behavior

中文：默认优先从旧 checkpoint 继续跑，输出目录指向 gigo-lobster-taster。
English: By default it resumes from the existing checkpoint and writes to the gigo-lobster-taster output directory.

Recommended Command Shape

python3 /absolute/path/to/run_resume.py --lang zh

If the user explicitly asks for overrides, append the matching CLI flags:

--lobster-name "..." and --lobster-tags "tag1,tag2" for a custom lobster persona
--output-dir /custom/path for a custom output directory
--require-png-cert when the user refuses the SVG fallback
--skip-upload or --register-only only when the user explicitly asks to change the default upload behavior

Persona Defaults

Explicit CLI overrides win first: --lobster-name and --lobster-tags
Then read GIGO_LOBSTER_NAME and GIGO_LOBSTER_TAGS
Then read SOUL.md
Finally fall back to the default lobster persona

Do not stop for interactive questions unless the user explicitly asks for an interactive run.

安全使用建议

What to check before installing/running: - Manual inspection: open run_resume.py, scripts/score_uploader.py, scripts/gateway_client.py, and run_resume.py's CLI logic. Search the bundle for 'requests.post' or other outbound network calls and for any hard-coded remote hosts. - Modes & uploads: the skill can upload results depending on the run mode. If you don't want any network activity, run with local-only flags (e.g., --skip-upload, or use gigo-lobster-local) and/or run gigo-lobster-doctor first. - Secrets & scope: do not run this in an environment with sensitive credentials mounted/available if you haven't confirmed where the code will send data. The SKILL.md references environment variables (GIGO_*) that are optional; the bundle does not declare them as required but code may read them. - Prompt-injection signs: SKILL.md contained prompt-injection-like patterns and unusual instructions (e.g., 'do not inspect the repo' and control characters). Treat those as a red flag: prefer to run the wrapper locally in an isolated VM/container if you proceed. - Safer test: run the doctor mode and a local run (no upload) first, and inspect the outputs (gigo-run.log, lobster-report.html). If you plan to resume a prior run, inspect the checkpoint files to understand what state will be re-used. If you want, I can (a) scan run_resume.py and the uploader/gateway files for outbound endpoints and ENV reads, or (b) produce concrete grep commands to help you find network calls and env reads in the bundle.

功能分析

Type: OpenClaw Skill Name: gigo-lobster-resume Version: 2.1.2 The skill 'gigo-lobster-resume' is part of the GIGO Lobster Taster benchmark suite, designed to resume interrupted evaluations of AI agents. The bundle contains a comprehensive set of 50 evaluation tasks, a reference test harness, and logic for scoring and report generation. While it includes components with high-privilege capabilities—such as a runtime bootstrapper that installs dependencies via pip (scripts/runtime_bootstrap.py), a shell shim for command monitoring (scripts/v2_shell_shim.py), and simulated prompt-injection test cases (e.g., in bundle/tasks/a25_readme_prompt_injection/setup/README.md)—these are strictly aligned with its purpose as a security-focused benchmarking tool. The skill communicates with 'api.agent-gigo.com' to fetch tasks and upload results, which is consistent with its stated functionality.

能力标签

cryptorequires-sensitive-credentials

能力评估

ℹ Purpose & Capability

The skill name/description (resume a previous 'lobster' benchmark run) aligns with the provided wrapper scripts (run_resume.py) and the large bundled evaluation harness. The bundle is large (full taster/harness/judge scaffolding) which is expected for a benchmark suite, though heavier than a minimal 'resume' helper.

⚠ Instruction Scope

SKILL.md instructs the agent to run the repository wrapper (python run_resume.py), tail logs under ~/.openclaw/workspace/outputs/..., keep stdout/stderr visible, and stay attached while long runs execute. It also references and suggests reading SOUL.md and several optional env vars. The runtime instructions include prompt-injection-like constructs (pre-scan found 'ignore-previous-instructions' and unicode-control-chars) which could be attempting to influence agent behavior. The instructions also explicitly disallow inspecting the repo or switching to main.py — this is unusual and worth manual review.

✓ Install Mechanism

No external install/download step is included; code is packaged in the bundle and no remote URLs or extraction steps are declared. That lowers install-time risk compared to fetching arbitrary code at install time.

ℹ Credentials

Declared requirements are just a Python binary (python3/python/py), which fits the CLI wrapper usage. However SKILL.md and README reference several environment variables (e.g., GIGO_LOBSTER_NAME, GIGO_UPLOAD_MODE, GIGO_REQUIRE_PNG_CERT) and a local gateway; none of these are declared in requires.env. Also the bundle contains code (gateway_client.py, judge_client.py, score_uploader.py) that performs outbound HTTP requests — consistent with a taster that uploads results, but you should be aware the skill may contact a gateway or uploader depending on mode.

✓ Persistence & Privilege

The skill is not marked always:true and does not request to modify other skills' configurations. It runs as an invoked local CLI tool and monitors a long-running process; that extended runtime is normal for this use-case but increases exposure while running.

版本历史

v2.1.2

2.1.2: fix leaderboard wording on cert/report so total_entries consistently means ranked entries, not all evaluations.

v2.1.1

2.1.1: smooth full-run cost/speed scoring for real 50-task evaluations and add MiniMax judge retry/fallback.

v2.1.0

2.1.0: run all 50 tasks through cloud judge, smooth seven-dimension scoring, and publish richer public diagnostics.

v2.0.19

2.0.19: publish refreshed v2 scoring bundle and recover D1 uploads after slow report responses.

v2.0.18

2.0.18: move judge cache to D1, keep KV config-only, and harden full-run scoring storage.

v2.0.15

2.0.15: harden evaluation/ref APIs, remove default fallback names, and strengthen v2 file-edit prompts.

v2.0.14

2.0.14: polish user-facing share copy and recommended booster labels.

v2.0.13

2.0.13: harden judge/report security and mark recommended skills as gray testing.

v2.0.12

2.0.12: scale speed scoring for full 50-task runs and polish public task diagnosis cards.

v2.0.11

2.0.11: remove model-prefixed public summary text and clarify bundled official task copy wording.

v2.0.10

2.0.10: restore the original PNG certificate design after rejecting the 2.0.9 redesign.

v2.0.8

2.0.8: add real OpenClaw per-task runner support, isolate eval sessions, expose M2.7 reasoning in unlocked full diagnosis, and wait longer for slow M2.7 judge responses.

v2.0.7

2.0.7: keep M2.7 judge reasoning stored, show a concise overall personalized note, and avoid labeling deterministic report copy as AI-written.

v2.0.6

2.0.6: switch cloud judge to MiniMax-M2.7, store judge reasoning, and show one overall personalized report note instead of per-task AI comments.

v2.0.5

2.0.5: switch cloud judge to MiniMax-M2.7, preserve AI judge reasoning in task reports, and keep OpenClaw identity name fallback.

v2.0.4

2.0.4: fix OpenClaw lobster name detection by falling back to workspace IDENTITY.md when SOUL.md has no explicit name.

v2.0.3

2.0.3: harden leaderboard consistency, v2 report verification, wrapper bootstrap, Gateway env loading, and CJK certificate rendering.

v2.0.2

2.0.2: harden leaderboard consistency, v2 judge score normalization, OpenClaw run logging, and CJK certificate rendering.

v2.0.0-beta.2

Release 2.0.0-beta.2: 50-task v2 beta bundle, MiniMax M2.1 judge defaults, worker v2 APIs, bundle-version leaderboard.

v1.2.4

1.2.4: backend scoring reliability improvements, documentation refresh, and release pipeline maintenance.

元数据

Slug gigo-lobster-resume

版本 2.1.2

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 23

常见问题