โ† Back to Skills Marketplace
gigolab

Gigo Lobster Taster

by gigolab ยท GitHub โ†— ยท v2.1.2 ยท MIT-0
darwinlinuxwin32 โš  suspicious
468
Downloads
0
Stars
1
Active Installs
32
Versions
Install in OpenClaw
/install gigo-lobster-taster
Description
๐Ÿฆž GIGO ยท gigo-lobster-taster: ๆญฃๅผ่ฏ•ๅƒๆจกๅผ๏ผš่ท‘ๅฎŒๆ•ด่ฏ„ๆต‹๏ผŒ้ป˜่ฎคไธŠไผ ไบ‘็ซฏใ€็”Ÿๆˆไธชไบบ็ป“ๆžœ้กตๅนถ่ฟ›ๅ…ฅๆŽ’่กŒๆฆœใ€‚ Triggers: ่ฏ•ๅƒๆˆ‘็š„้พ™่™พ / ๅ“้‰ดๆˆ‘็š„้พ™่™พ / lobster taste / lobster taster.
README (SKILL.md)

gigo-lobster-taster

Mission

  • ๆญฃๅผ่ฏ•ๅƒๆจกๅผ๏ผš่ท‘ๅฎŒๆ•ด่ฏ„ๆต‹๏ผŒ้ป˜่ฎคไธŠไผ ไบ‘็ซฏใ€็”Ÿๆˆไธชไบบ็ป“ๆžœ้กตๅนถ่ฟ›ๅ…ฅๆŽ’่กŒๆฆœใ€‚
  • Primary tasting mode: runs the full benchmark, uploads the verified result, creates a personal share page, and enters the leaderboard.

Trigger Phrases

  • ไธญๆ–‡๏ผš่ฏ•ๅƒๆˆ‘็š„้พ™่™พ / ๅ“้‰ดๆˆ‘็š„้พ™่™พ / ้‰ดๅฎšๆˆ‘็š„้พ™่™พ / ่ฏ„ไผฐๆˆ‘็š„้พ™่™พ
  • English: lobster taste / lobster taster / taste my lobster / lobster eval

Execution Rules

  1. Use a direct Python command on this skill directory's wrapper file. Never use cd ... && python ...; OpenClaw preflight may reject it.
  2. Prefer python3, then python, then py.
  3. If the user asked in Chinese, append --lang zh. If the user asked in English, append --lang en.
  4. Stream short progress updates while the benchmark is running.
  5. Keep stdout/stderr visible and remind the user that the full log is written to gigo-run.log.
  6. Do not run --help, inspect the whole repo, or switch to main.py once the wrapper command is clear. Start the wrapper directly.
  7. If the wrapper starts a long-running process, do not kill it just because stdout is quiet for a while. A full tasting run often takes 15-25 minutes.
  8. While a long run is in progress, monitor the process and tail the log file under ~/.openclaw/workspace/outputs/gigo-lobster-taster/gigo-run.log instead of improvising a second execution path.
  9. Only declare failure if the process exits non-zero, the log shows a traceback, or the user explicitly asks to cancel.
  10. Stay attached until the wrapper exits. Do not end the conversation with โ€œI will keep monitoringโ€; keep polling and only report completion once you have the final score/result files/ref_code (if any).
  11. Prefer process poll plus exec tail -n 50 .../gigo-run.log while monitoring. Do not use a generic full-file read on gigo-run.log, because the log can be large and may break the chat output.

Default Behavior

  • ไธญๆ–‡๏ผš้ป˜่ฎคไผšๆญฃๅผไธŠไผ ใ€็”Ÿๆˆไธชไบบ็ป“ๆžœ้กตๅนถ่ฟ›ๅ…ฅๆŽ’่กŒๆฆœใ€‚
  • English: By default it uploads the verified result, creates a personal share page, and enters the leaderboard.

Recommended Command Shape

python3 /absolute/path/to/run_upload.py --lang zh

If the user explicitly asks for overrides, append the matching CLI flags:

  • --lobster-name "..." and --lobster-tags "tag1,tag2" for a custom lobster persona
  • --output-dir /custom/path for a custom output directory
  • --require-png-cert when the user refuses the SVG fallback
  • --skip-upload or --register-only only when the user explicitly asks to change the default upload behavior

Persona Defaults

  • Explicit CLI overrides win first: --lobster-name and --lobster-tags
  • Then read GIGO_LOBSTER_NAME and GIGO_LOBSTER_TAGS
  • Then read SOUL.md
  • Finally fall back to the default lobster persona

Do not stop for interactive questions unless the user explicitly asks for an interactive run.

Usage Guidance
This skill will, by default, run a local benchmark and upload results to a gateway / leaderboard. Before installing or running it: 1) Inspect run_upload.py, scripts/score_uploader.py and scripts/gateway_client.py to find the upload endpoint(s) and how authentication is handled; look for hard-coded URLs or calls that will POST your results. 2) If you do not want results to leave your machine, run in an offline environment or use the companion local mode (gigo-lobster-local) or pass --skip-upload explicitly. 3) Be wary of the SKILL.md instruction that tells the agent not to inspect the repo or run --help โ€” that restricts normal safety checks and is a red flag. 4) If you must run it, review the code for where gateway_base and auth come from (env vars, config files) and consider setting dummy/unprivileged values or running in an isolated VM/container. 5) If unsure, mark this skill as untrusted or ask the publisher for explicit documentation of the upload endpoint and auth model before use.
Capability Analysis
Type: OpenClaw Skill Name: gigo-lobster-taster Version: 2.1.2 The skill is a comprehensive benchmarking tool designed to evaluate AI agents across multiple dimensions, including task completion, reasoning, and safety. While the bundle contains simulated attack vectors such as prompt injection traps (e.g., in `bundle/tasks/a25_readme_prompt_injection/setup/README.md`) and dangerous script execution tests (e.g., `bundle/tasks/a27_refuse_eval_user_input/setup/dangerous.py`), these are explicitly used as test cases to measure the agent's robustness and refusal behavior. The skill implements a shell shim (`scripts/v2_shell_shim.py`) to monitor and block potentially harmful commands like `rm -rf /` or unauthorized SSH key access during the evaluation process. It also includes a self-bootstrapping mechanism (`scripts/runtime_bootstrap.py`) to manage its own dependencies safely within a virtual environment.
Capability Tags
cryptorequires-sensitive-credentials
Capability Assessment
โš  Purpose & Capability
Name/description say: run full tasting, upload to cloud, create share page and leaderboard. The bundle contains uploader/judge clients and scripts that perform network calls (requests.post) and leaderboard logic โ€” that matches the stated purpose. However the skill declares no required environment variables or primary credential despite performing uploads and calling a gateway/judge endpoint. A legitimate upload/upload-auth flow would normally require gateway URL and auth credentials; their absence is an incoherence.
โš  Instruction Scope
SKILL.md explicitly instructs the agent to run a wrapper in-place, stream logs, and โ€” unusually โ€” to not run `--help`, not inspect the repo, and not switch to other entry points. It also directs the agent to tail a log file in the user's workspace and to upload results by default. The prohibition on inspection is suspicious (it limits normal safety checks) and the default-to-upload behavior may cause data to be sent externally without declared auth details.
โ„น Install Mechanism
There is no install spec (instruction-only), which is lower risk for arbitrary downloads. However the skill bundle contains 400+ files including scripts that will be written to disk when the skill is installed. Several files (judge_client, gateway_client, score_uploader, run_upload.py) include network logic (requests). The lack of an explicit install URL lowers supply-chain risk, but a large code bundle that performs network I/O is still a meaningful runtime surface to review before executing.
โš  Credentials
The SKILL.md and README reference many environment variables and configuration (GIGO_LOBSTER_*, GIGO_UPLOAD_MODE, OPENCLAW_WORKSPACE_DIR, gateway base/auth) but the skill metadata declares no required env vars or primary credential. Uploading/judging functionality implies gateway URL and authentication โ€” requiring zero env vars is disproportionate and unexplained.
โ„น Persistence & Privilege
Flags: always=false (good). disable-model-invocation=false (normal). The skill does not request forced always-on privileges or to modify other skills. Still, because it is allowed to execute autonomously and performs network uploads by default, the combination with other concerns increases blast radius if misused.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install gigo-lobster-taster
  3. After installation, invoke the skill by name or use /gigo-lobster-taster
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v2.1.2
2.1.2: fix leaderboard wording on cert/report so total_entries consistently means ranked entries, not all evaluations.
v2.1.1
2.1.1: smooth full-run cost/speed scoring for real 50-task evaluations and add MiniMax judge retry/fallback.
v2.1.0
2.1.0: run all 50 tasks through cloud judge, tighten speed scoring, and publish richer public diagnostics.
v2.0.19
2.0.19: publish refreshed v2 scoring bundle and recover D1 uploads after slow report responses.
v2.0.18
2.0.18: move judge cache to D1, keep KV config-only, and harden full-run scoring storage.
v2.0.15
2.0.15: harden evaluation/ref APIs, remove default 'ๅคงไพ ' fallback, and strengthen real file-edit prompts for v2 code tasks.
v2.0.14
2.0.14: polish user-facing share copy and recommended booster labels.
v2.0.13
2.0.13: harden judge/report security and mark recommended skills as gray testing.
v2.0.12
2.0.12: scale speed scoring for full 50-task runs and polish public task diagnosis cards.
v2.0.11
2.0.11: remove model-prefixed public summary text and clarify bundled official task copy wording.
v2.0.10
2.0.10: restore the original PNG certificate design after rejecting the 2.0.9 redesign.
v2.0.9
2.0.9: redesign PNG certificate toward the clean reference layout while preserving the existing QR/link flow.
v2.0.8
2.0.8: add real OpenClaw per-task runner support, isolate eval sessions, expose M2.7 reasoning in unlocked full diagnosis, and wait longer for slow M2.7 judge responses.
v2.0.7
2.0.7: keep M2.7 judge reasoning stored, show a concise overall personalized note, and avoid labeling deterministic report copy as AI-written.
v2.0.6
2.0.6: switch cloud judge to MiniMax-M2.7, store judge reasoning, and show one overall personalized report note instead of per-task AI comments.
v2.0.5
2.0.5: switch cloud judge to MiniMax-M2.7, preserve AI judge reasoning in task reports, and keep OpenClaw identity name fallback.
v2.0.4
2.0.4: fix OpenClaw lobster name detection by falling back to workspace IDENTITY.md when SOUL.md has no explicit name.
v2.0.3
2.0.3: harden leaderboard consistency, v2 report verification, wrapper bootstrap, Gateway env loading, and CJK certificate rendering.
v2.0.2
2.0.2: harden leaderboard consistency, v2 judge score normalization, OpenClaw run logging, and CJK certificate rendering.
v2.0.1
2.0.1: harden v2 judge score normalization and OpenClaw run logging.
Metadata
Slug gigo-lobster-taster
Version 2.1.2
License MIT-0
All-time Installs 1
Active Installs 1
Total Versions 32
Frequently Asked Questions

What is Gigo Lobster Taster?

๐Ÿฆž GIGO ยท gigo-lobster-taster: ๆญฃๅผ่ฏ•ๅƒๆจกๅผ๏ผš่ท‘ๅฎŒๆ•ด่ฏ„ๆต‹๏ผŒ้ป˜่ฎคไธŠไผ ไบ‘็ซฏใ€็”Ÿๆˆไธชไบบ็ป“ๆžœ้กตๅนถ่ฟ›ๅ…ฅๆŽ’่กŒๆฆœใ€‚ Triggers: ่ฏ•ๅƒๆˆ‘็š„้พ™่™พ / ๅ“้‰ดๆˆ‘็š„้พ™่™พ / lobster taste / lobster taster. It is an AI Agent Skill for Claude Code / OpenClaw, with 468 downloads so far.

How do I install Gigo Lobster Taster?

Run "/install gigo-lobster-taster" in the OpenClaw or Claude Code chat to install it in one step โ€” no extra setup required.

Is Gigo Lobster Taster free?

Yes, Gigo Lobster Taster is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Gigo Lobster Taster support?

Gigo Lobster Taster is cross-platform and runs anywhere OpenClaw / Claude Code is available (darwin, linux, win32).

Who created Gigo Lobster Taster?

It is built and maintained by gigolab (@gigolab); the current version is v2.1.2.

๐Ÿ’ฌ Comments