功能描述

🦞 GIGO · gigo-lobster-register: 分享页模式：跑完整评测并生成个人结果页，但不上排行榜。 Triggers: 注册龙虾结果页 / 分享我的龙虾 / register lobster share page / share my lobster without leaderboard.

使用说明 (SKILL.md)

gigo-lobster-register

Name: Gigo Lobster Register
Author: gigolab

Mission

分享页模式：跑完整评测并生成个人结果页，但不上排行榜。
Share-page mode: runs the full benchmark and creates a personal result page without entering the leaderboard.

Trigger Phrases

中文：注册龙虾结果页 / 分享我的龙虾 / 龙虾上分享页但不上榜 / 只注册龙虾分享页
English: register lobster share page / share my lobster without leaderboard / lobster share only / register lobster result page

Execution Rules

Use a direct Python command on this skill directory's wrapper file. Never use cd ... && python ...; OpenClaw preflight may reject it.
Prefer python3, then python, then py.
If the user asked in Chinese, append --lang zh. If the user asked in English, append --lang en.
Stream short progress updates while the benchmark is running.
Keep stdout/stderr visible and remind the user that the full log is written to gigo-run.log.
Do not run --help, inspect the whole repo, or switch to main.py once the wrapper command is clear. Start the wrapper directly.
If the wrapper starts a long-running process, do not kill it just because stdout is quiet for a while. A full tasting run often takes 15-25 minutes.
While a long run is in progress, monitor the process and tail the log file under ~/.openclaw/workspace/outputs/gigo-lobster-register/gigo-run.log instead of improvising a second execution path.
Only declare failure if the process exits non-zero, the log shows a traceback, or the user explicitly asks to cancel.
Stay attached until the wrapper exits. Do not end the conversation with “I will keep monitoring”; keep polling and only report completion once you have the final score/result files/ref_code (if any).
Prefer process poll plus exec tail -n 50 .../gigo-run.log while monitoring. Do not use a generic full-file read on gigo-run.log, because the log can be large and may break the chat output.

Default Behavior

中文：默认只注册个人结果页，不进入排行榜。
English: By default it creates a personal result page without entering the leaderboard.

Recommended Command Shape

python3 /absolute/path/to/run_register.py --lang zh

If the user explicitly asks for overrides, append the matching CLI flags:

--lobster-name "..." and --lobster-tags "tag1,tag2" for a custom lobster persona
--output-dir /custom/path for a custom output directory
--require-png-cert when the user refuses the SVG fallback
--skip-upload or --register-only only when the user explicitly asks to change the default upload behavior

Persona Defaults

Explicit CLI overrides win first: --lobster-name and --lobster-tags
Then read GIGO_LOBSTER_NAME and GIGO_LOBSTER_TAGS
Then read SOUL.md
Finally fall back to the default lobster persona

Do not stop for interactive questions unless the user explicitly asks for an interactive run.

安全使用建议

What to consider before installing/running: - Treat this as code-heavy: the skill bundle contains a full evaluation harness and networking clients (gateway/judge/score uploader). If you only expect a tiny "register-only" helper, this is wider in scope. - SKILL.md contains strong 'do not inspect' directives and prompt-injection signals. Before running anything, manually inspect the wrapper (run_register.py) and the network-related scripts (scripts/gateway_client.py, scripts/score_uploader.py, bundle/harness_reference/judge_client.py) to confirm what endpoints are contacted and what data is sent. - Because the skill can upload and call a cloud /judge endpoint, run it in an isolated environment (VM/container) or with network egress blocked until you confirm safe behavior. Prefer using flags like --skip-upload or run the 'doctor' mode first to see environment checks. - Verify any environment variables it reads (GIGO_*, OPENCLAW_* or similar) — the skill metadata declares none, but the code/docs reference them. Do not run with sensitive credentials in your environment until you know what will be sent. - If you proceed, monitor network activity (e.g., with a firewall, tcpdump) and inspect ~/.openclaw/workspace/outputs/gigo-lobster-register/gigo-run.log and the wrapper's stdout. Consider running a dry-run or opening the repository files yourself rather than following the SKILL.md instruction to ‘‘not inspect the repo’’. If you want, I can: (a) point to the exact lines in run_register.py and the gateway/score uploader files that perform network calls, or (b) produce a short checklist of files to inspect before executing.

功能分析

Type: OpenClaw Skill Name: gigo-lobster-register Version: 2.1.2 The skill is a comprehensive benchmarking framework (GIGO Lobster Taster) designed to evaluate the performance and safety of AI agents. It functions by executing a suite of 50 tasks (found in the 'bundle/tasks' directory) and scoring the results across seven dimensions. While the bundle contains risky capabilities such as automated environment bootstrapping via venv/pip (scripts/runtime_bootstrap.py), network communication with a cloud API (api.agent-gigo.com), and the execution of shell commands, these actions are strictly aligned with its stated purpose. Notably, the framework includes security-enhancing features like a 'ShellShim' (scripts/v2_shell_shim.py) to monitor and block risky commands during testing. The prompt injection strings found in task setup files (e.g., bundle/tasks/a25_readme_prompt_injection/setup/README.md) are intentional test cases for the benchmarked agent and do not target the OpenClaw agent itself.

能力标签

cryptorequires-sensitive-credentials

能力评估

⚠ Purpose & Capability

The SKILL.md and README present this as a "register-only" companion mode, but the bundle contains a full evaluation harness (50 tasks), gateway/judge clients, and upload/score_uploader code (e.g., scripts/gateway_client.py, bundle/harness_reference/judge_client.py, scripts/score_uploader.py). That larger capability (cloud /judge calls and uploading) is more than you'd expect from a simple "register share page" skill. Also SKILL.md references environment variables like GIGO_LOBSTER_NAME, GIGO_UPLOAD_MODE, GIGO_REQUIRE_PNG_CERT that are not declared in the skill metadata.

⚠ Instruction Scope

SKILL.md gives very prescriptive runtime rules that limit inspection ("Do not run --help, inspect the whole repo, or switch to main.py once the wrapper command is clear") and explicitly steers the agent to run a wrapper directly and tail a specific log path (~/.openclaw/workspace/outputs/gigo-lobster-register/gigo-run.log). Those directives look like prompt-injection style containment that prevents a user/agent from exploring or verifying repository behavior before execution. The instructions also reference reading environment variables and SOUL.md for persona defaults even though none are declared in metadata.

ℹ Install Mechanism

There is no external install spec (no remote downloads), which lowers install-time supply-chain risk. However, the skill ships a large code bundle (hundreds of files) that will be executed locally if you run the wrapper; that increases runtime surface compared to a small instruction-only skill.

⚠ Credentials

Declared requirements list no environment variables or primary credential, yet the runtime docs and code reference many configuration points (GIGO_* env vars, gateway/judge endpoints). The judge_client/gateway_client code performs network POSTs to a /judge endpoint and expects an encrypt/decrypt hook; these likely rely on runtime configuration not declared in the skill metadata. The lack of declared env vars vs. actual code behavior is a mismatch and could hide required secrets or unexpected network communication.

✓ Persistence & Privilege

The skill does not request always:true and does not modify other skills' configs according to metadata. It is user-invocable and allows autonomous model invocation by default (the platform default). No excessive persistence privileges are declared.

版本历史

v2.1.2

2.1.2: fix leaderboard wording on cert/report so total_entries consistently means ranked entries, not all evaluations.

v2.1.1

2.1.1: smooth full-run cost/speed scoring for real 50-task evaluations and add MiniMax judge retry/fallback.

v2.1.0

2.1.0: run all 50 tasks through cloud judge, smooth seven-dimension scoring, and publish richer public diagnostics.

v2.0.19

2.0.19: publish refreshed v2 scoring bundle and recover D1 uploads after slow report responses.

v2.0.18

2.0.18: move judge cache to D1, keep KV config-only, and harden full-run scoring storage.

v2.0.15

2.0.15: harden evaluation/ref APIs, remove default fallback names, and strengthen v2 file-edit prompts.

v2.0.14

2.0.14: polish user-facing share copy and recommended booster labels.

v2.0.13

2.0.13: harden judge/report security and mark recommended skills as gray testing.

v2.0.12

2.0.12: scale speed scoring for full 50-task runs and polish public task diagnosis cards.

v2.0.11

2.0.11: remove model-prefixed public summary text and clarify bundled official task copy wording.

v2.0.10

2.0.10: restore the original PNG certificate design after rejecting the 2.0.9 redesign.

v2.0.8

2.0.8: add real OpenClaw per-task runner support, isolate eval sessions, expose M2.7 reasoning in unlocked full diagnosis, and wait longer for slow M2.7 judge responses.

v2.0.7

2.0.7: keep M2.7 judge reasoning stored, show a concise overall personalized note, and avoid labeling deterministic report copy as AI-written.

v2.0.6

2.0.6: switch cloud judge to MiniMax-M2.7, store judge reasoning, and show one overall personalized report note instead of per-task AI comments.

v2.0.5

2.0.5: switch cloud judge to MiniMax-M2.7, preserve AI judge reasoning in task reports, and keep OpenClaw identity name fallback.

v2.0.4

2.0.4: fix OpenClaw lobster name detection by falling back to workspace IDENTITY.md when SOUL.md has no explicit name.

v2.0.3

2.0.3: harden leaderboard consistency, v2 report verification, wrapper bootstrap, Gateway env loading, and CJK certificate rendering.

v2.0.2

2.0.2: harden leaderboard consistency, v2 judge score normalization, OpenClaw run logging, and CJK certificate rendering.

v2.0.0-beta.2

Release 2.0.0-beta.2: 50-task v2 beta bundle, MiniMax M2.1 judge defaults, worker v2 APIs, bundle-version leaderboard.

v1.2.4

1.2.4: backend scoring reliability improvements, documentation refresh, and release pipeline maintenance.

元数据

Slug gigo-lobster-register

版本 2.1.2

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 23

常见问题