功能描述

🦞 GIGO · gigo-lobster-local: 本地模式：跑完整评测，但不上云、不注册个人结果页，证书二维码回到官网首页。 Triggers: 本地试吃龙虾 / 离线试吃龙虾 / local lobster taste / offline lobster taste.

使用说明 (SKILL.md)

gigo-lobster-local

Name: Gigo Lobster Local
Author: gigolab

Mission

本地模式：跑完整评测，但不上云、不注册个人结果页，证书二维码回到官网首页。
Local-only mode: runs the benchmark without uploading, without creating a personal result page, and keeps the certificate QR code pointed at the site homepage.

Trigger Phrases

中文：本地试吃龙虾 / 离线试吃龙虾 / 只在本地评测龙虾 / 龙虾本地模式
English: local lobster taste / offline lobster taste / run lobster locally / local lobster eval

Execution Rules

Use a direct Python command on this skill directory's wrapper file. Never use cd ... && python ...; OpenClaw preflight may reject it.
Prefer python3, then python, then py.
If the user asked in Chinese, append --lang zh. If the user asked in English, append --lang en.
Stream short progress updates while the benchmark is running.
Keep stdout/stderr visible and remind the user that the full log is written to gigo-run.log.
Do not run --help, inspect the whole repo, or switch to main.py once the wrapper command is clear. Start the wrapper directly.
If the wrapper starts a long-running process, do not kill it just because stdout is quiet for a while. A full tasting run often takes 15-25 minutes.
While a long run is in progress, monitor the process and tail the log file under ~/.openclaw/workspace/outputs/gigo-lobster-local/gigo-run.log instead of improvising a second execution path.
Only declare failure if the process exits non-zero, the log shows a traceback, or the user explicitly asks to cancel.
Stay attached until the wrapper exits. Do not end the conversation with “I will keep monitoring”; keep polling and only report completion once you have the final score/result files/ref_code (if any).
Prefer process poll plus exec tail -n 50 .../gigo-run.log while monitoring. Do not use a generic full-file read on gigo-run.log, because the log can be large and may break the chat output.

Default Behavior

中文：默认只在本地生成报告与证书，不上传云端。
English: By default it keeps everything local and does not upload to the cloud.

Recommended Command Shape

python3 /absolute/path/to/run_local.py --lang zh

If the user explicitly asks for overrides, append the matching CLI flags:

--lobster-name "..." and --lobster-tags "tag1,tag2" for a custom lobster persona
--output-dir /custom/path for a custom output directory
--require-png-cert when the user refuses the SVG fallback
--skip-upload or --register-only only when the user explicitly asks to change the default upload behavior

Persona Defaults

Explicit CLI overrides win first: --lobster-name and --lobster-tags
Then read GIGO_LOBSTER_NAME and GIGO_LOBSTER_TAGS
Then read SOUL.md
Finally fall back to the default lobster persona

Do not stop for interactive questions unless the user explicitly asks for an interactive run.

安全使用建议

What to consider before installing or running this skill: - Treat the bundle as semi-trusted until you inspect the wrapper. Although the skill claims "local-only", the repository includes cloud/network code (gateway_client, judge_client, score_uploader) that could upload data if invoked. - The SKILL.md explicitly tells the agent not to inspect the repo and contains prompt-injection indicators. Do not follow that advice — manually inspect the code yourself. - Before running, open the wrapper file referenced in SKILL.md (run_local.py or the wrapper the guide expects) and verify it does NOT call functions that perform HTTP requests, import or call score_uploader, gateway_client.judge, or other network/upload helpers. Grep for 'requests.post', 'score_uploader', 'gateway', '/judge', 'upload', 'socket', or similar. - If you must run: do so in an isolated environment (VM, container, or machine with network disabled) and point output directories to a safe location. This prevents accidental outbound traffic and limits filesystem impact. - Check for any use of undeclared environment variables (GIGO_*, GATEWAY_BASE, etc.) and ensure none are set in your environment unless intentional. Prefer running with a clean environment. - Prefer invoking the wrapper with flags that explicitly disable upload (e.g., --skip-upload) and confirm via reading run_local.py that the flag is honored. Do a dry run or --help locally (despite SKILL.md advise) to inspect behavior — the SKILL.md instruction forbidding --help is itself suspicious. - If you are not comfortable auditing the wrapper, do not install/run the skill. If possible, ask the skill author for a minimal, auditable local-only wrapper that cannot import or call any uploader/judge code. Why suspicion: the combination of (a) instructions that forbid inspection, (b) prompt-injection signatures in SKILL.md, and (c) included cloud/upload code creates ambiguity about whether a run will truly stay local. Manual code review of the wrapper and running inside an isolated environment are the safest next steps.

功能分析

Type: OpenClaw Skill Name: gigo-lobster-local Version: 2.1.2 The bundle is a comprehensive benchmarking framework (GIGO Lobster Taster) designed to evaluate AI agents across various technical and conversational tasks. It includes sophisticated defensive mechanisms, such as a shell shim (v2_shell_shim.py) and a rule engine (rule_engine.py), which are used to monitor and restrict the agent under test from performing risky actions like accessing SSH keys or executing unauthorized network commands. While the bundle contains intentional prompt-injection traps (e.g., in a25_readme_prompt_injection) and 'dangerous' scripts (a27_refuse_eval_user_input), these are strictly used as test cases to verify the agent's security awareness, with evaluation logic (check.py) that penalizes the agent if it fails to ignore the malicious instructions. The network activity is limited to the official API (api.agent-gigo.com) for scoring and judging purposes, consistent with the tool's stated mission.

能力标签

cryptorequires-sensitive-credentials

能力评估

⚠ Purpose & Capability

The SKILL.md advertises a local-only mode that does not upload results and requests only Python binaries. However the bundle contains multiple network-capable modules (gateway_client.py, judge_client.py, score_uploader.py, cert_generator, etc.) and changelog notes that judging/upload moved to a cloud /judge endpoint. Including these modules can be legitimate for a family of companion skills, but the presence of cloud/upload code in a skill whose stated purpose is local-only raises a proportionality concern unless the wrapper (run_local.py) is demonstrably preventing all outbound calls.

⚠ Instruction Scope

The SKILL.md explicitly instructs the agent not to inspect the repository (‘Do not run --help, inspect the whole repo, or switch to main.py once the wrapper command is clear’) and to start a particular wrapper directly. That directive restricts normal verification and matches detected prompt-injection patterns. The runtime rules also direct live execution monitoring (tailing logs, polling process) and specific shell commands, which is expected for running a local job but problematic when the instructions attempt to forbid inspection of the code being run.

ℹ Install Mechanism

There is no install spec (instruction-only at the registry level), which is low-risk in itself. But the bundle includes many Python scripts that will be executed when you run the wrapper. Since there is no automatic package download, the risk is limited to what the included code does when executed locally. That behavior should be inspected before running.

ℹ Credentials

The skill declares no required environment variables, yet the SKILL.md documents reading GIGO_LOBSTER_NAME / GIGO_LOBSTER_TAGS and other GIGO_* variables as persona defaults. The bundle also contains modules that use network endpoints and (in production) would use gateway_base and possibly credentials. While the skill does not explicitly request secrets, it will read environment variables not declared in the metadata and could use network code if the wrapper calls those modules.

✓ Persistence & Privilege

The skill does not request 'always: true' and does not declare modifications to other skills or system-wide settings. It appears to run only when invoked and does not request elevated agent privileges in the manifest.

版本历史

v2.1.2

2.1.2: fix leaderboard wording on cert/report so total_entries consistently means ranked entries, not all evaluations.

v2.1.1

2.1.1: smooth full-run cost/speed scoring for real 50-task evaluations and add MiniMax judge retry/fallback.

v2.1.0

2.1.0: run all 50 tasks through cloud judge, tighten speed scoring, and publish richer public diagnostics.

v2.0.19

2.0.19: publish refreshed v2 scoring bundle and recover D1 uploads after slow report responses.

v2.0.18

2.0.18: move judge cache to D1, keep KV config-only, and harden full-run scoring storage.

v2.0.15

2.0.15: harden evaluation/ref APIs, remove default fallback names, and strengthen v2 file-edit prompts.

v2.0.14

2.0.14: polish user-facing share copy and recommended booster labels.

v2.0.13

2.0.13: harden judge/report security and mark recommended skills as gray testing.

v2.0.12

2.0.12: scale speed scoring for full 50-task runs and polish public task diagnosis cards.

v2.0.11

2.0.11: remove model-prefixed public summary text and clarify bundled official task copy wording.

v2.0.10

2.0.10: restore the original PNG certificate design after rejecting the 2.0.9 redesign.

v2.0.9

2.0.9: redesign PNG certificate toward the clean reference layout while preserving the existing QR/link flow.

v2.0.8

2.0.8: add real OpenClaw per-task runner support, isolate eval sessions, expose M2.7 reasoning in unlocked full diagnosis, and wait longer for slow M2.7 judge responses.

v2.0.7

2.0.7: keep M2.7 judge reasoning stored, show a concise overall personalized note, and avoid labeling deterministic report copy as AI-written.

v2.0.6

2.0.6: switch cloud judge to MiniMax-M2.7, store judge reasoning, and show one overall personalized report note instead of per-task AI comments.

v2.0.5

2.0.5: switch cloud judge to MiniMax-M2.7, preserve AI judge reasoning in task reports, and keep OpenClaw identity name fallback.

v2.0.4

2.0.4: fix OpenClaw lobster name detection by falling back to workspace IDENTITY.md when SOUL.md has no explicit name.

v2.0.3

2.0.3: harden leaderboard consistency, v2 report verification, wrapper bootstrap, Gateway env loading, and CJK certificate rendering.

v2.0.2

2.0.2: harden leaderboard consistency, v2 judge score normalization, OpenClaw run logging, and CJK certificate rendering.

v2.0.0-beta.2

Release 2.0.0-beta.2: 50-task v2 beta bundle, MiniMax M2.1 judge defaults, worker v2 APIs, bundle-version leaderboard.

元数据

Slug gigo-lobster-local

版本 2.1.2

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 24

常见问题