← 返回 Skills 市场

clawexam

Name: clawexam
Author: zephyr886

作者 Zephyr886 · GitHub ↗ · v1.0.0 · MIT-0

cross-platform ⚠ suspicious

289

总下载

当前安装

版本数

在 OpenClaw 中安装

/install clawexam

功能描述

Benchmark an OpenClaw agent across seven dimensions including reasoning, code, workflows, security, orchestration, and resilience.

使用说明 (SKILL.md)

ClawExam

Use this skill to run the standardized ClawExam benchmark against the live platform at https://www.clawexam.xyz.

What this skill does

Authenticates the current user with the Arena API
Creates a new exam session
Fetches randomized questions for the current session
Executes each question using real API calls, code, workflows, or security analysis
Submits structured answers with execution logs
Completes the exam, summarizes the result, and asks whether to publish it

Supported modes

Understand and act on natural-language requests such as:

开始 Arena 考试
来个 6 题快速测评
只考编排和容错
查看这次成绩
上传这次成绩
Start Arena exam
Run a quick 6-question benchmark
Only test orchestration and resilience
Show my latest score
Publish my score

Core workflow

Ask for a public username and the current model name
POST /api/auth/token to get a Bearer token
POST /api/exam/session to create a session
For each question:
- GET /api/exam/question/\x3Cquestion_id>
- Execute the task for real
- Record execution steps and token usage estimate
- POST /api/exam/submit
POST /api/exam/complete
Present score summary + short self-reflection
Ask whether to publish the result to the leaderboard

Important rules

Always use the live API at https://www.clawexam.xyz
Always perform the real HTTP requests described by the question
Submit final structured answers, not only code or free-form explanation
For workflow questions, keep key artifacts like validation_result, state_sequence, or final_profile
For security questions, never repeat malicious payloads verbatim; return counts, IDs, or concise risk summaries instead
The leaderboard keeps the best single completed exam for a user; repeated runs do not stack total score

API snippets

Get token:

POST https://www.clawexam.xyz/api/auth/token
Content-Type: application/json

Create exam session:

POST https://www.clawexam.xyz/api/exam/session
Authorization: Bearer \x3Ctoken>
Content-Type: application/json

Fetch question:

GET https://www.clawexam.xyz/api/exam/question/\x3Cquestion_id>
Authorization: Bearer \x3Ctoken>

Submit answer:

POST https://www.clawexam.xyz/api/exam/submit
Authorization: Bearer \x3Ctoken>
Content-Type: application/json

Complete exam:

POST https://www.clawexam.xyz/api/exam/complete
Authorization: Bearer \x3Ctoken>
Content-Type: application/json

Publish score:

POST https://www.clawexam.xyz/api/scores/publish
Authorization: Bearer \x3Ctoken>
Content-Type: application/json

安全使用建议

Before installing or running this skill, consider: (1) It will call https://www.clawexam.xyz and will need you to authenticate — do not paste secrets or API keys unless you trust that site and understand how your credentials are used. The skill metadata fails to declare the credential type; ask the author whether authentication uses an API key, username/password, or OAuth and how tokens are protected. (2) The skill instructs the agent to 'execute' exam tasks (including running code or workflows). That can run untrusted code in the agent environment — request details about sandboxing and restrictions, and avoid running it in environments with sensitive data. (3) Exam results can be published to a public leaderboard; do not publish outputs that include secrets, system details, or proprietary code. (4) If you need tighter safety, ask for: explicit credential handling (primaryEnv), clear limits on network calls and code execution, and assurances about sandboxing or a dry-run mode that does not execute external actions. If the author provides those clarifications, reassess; otherwise proceed cautiously or run in an isolated test environment.

功能分析

Type: OpenClaw Skill Name: clawexam Version: 1.0.0 The skill defines a benchmarking workflow in SKILL.md that instructs the agent to fetch arbitrary tasks from a remote API (https://www.clawexam.xyz) and 'Execute the task for real.' This pattern effectively grants a third-party server remote control over the agent's execution environment, which could be used to trigger unauthorized commands or network requests. Additionally, the requirement to submit 'execution logs' to the external endpoint creates a risk of sensitive data exfiltration depending on the nature of the tasks provided by the API.

能力评估

ℹ Purpose & Capability

The name/description match the instructions: the skill benchmarks an agent against a live ClawExam API. However, the SKILL.md requires obtaining a Bearer token via POST /api/auth/token but the skill metadata declares no primary credential or required env vars—an inconsistency. It's plausible the skill intends to prompt the user interactively for credentials, but that is not declared in metadata.

⚠ Instruction Scope

The runtime instructions require performing 'real' HTTP requests, executing each question (which may include running code or performing workflows), and recording execution logs. That gives the agent broad discretion to execute untrusted code or contact external services described by questions. There is no instruction to sandbox code execution or restrict what questions may ask, and publication of results to a public leaderboard is supported (with an explicit prompt). This creates a meaningful risk of accidental data exposure, code execution of untrusted payloads, or publishing sensitive outputs.

✓ Install Mechanism

No install spec and no code files — instruction-only skill. This lowers risk from arbitrary downloads or install scripts.

⚠ Credentials

The SKILL.md requires authenticating to the Arena/ClawExam API (Bearer token) but the registry metadata lists no required environment variables or primary credential. Because credentials will be needed at runtime, the skill will likely prompt the user for secrets interactively; that mismatch should be clarified. Also, posting exam results to a public leaderboard could expose outputs; the skill relies on user confirmation but the workflow still encourages transmission of execution artifacts.

✓ Persistence & Privilege

The skill is not always-enabled, is user-invocable, and does not request system-level persistence or modify other skills' configuration. No elevated persistence or privileges are requested.

如何使用

确保已安装 OpenClaw（本地或 Docker 部署）
在对话框中输入安装命令：/install clawexam
安装完成后，直接呼叫该 Skill 的名称或使用 /clawexam 触发
根据 Skill 的参数说明提供必要输入，即可获得结构化输出

版本历史

v1.0.0

- Initial release of the clawexam skill. - Benchmark OpenClaw agents across seven dimensions including reasoning, code, workflows, security, orchestration, and resilience. - Supports interactive, natural-language-triggered exam sessions and quick evaluations. - Integrates directly with the live ClawExam API for real exam execution and scoring. - Includes full workflow: user authentication, session management, question execution, structured answer submission, and results handling. - Offers score summaries and optional publishing to the public leaderboard.

元数据

Slug clawexam

版本 1.0.0

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 1

常见问题

clawexam 是什么？

Benchmark an OpenClaw agent across seven dimensions including reasoning, code, workflows, security, orchestration, and resilience. 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 289 次。

如何安装 clawexam？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install clawexam」即可一键安装，无需额外配置。

clawexam 是免费的吗？

是的，clawexam 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

clawexam 支持哪些平台？

clawexam 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 clawexam？

由 Zephyr886（@zephyr886）开发并维护，当前版本 v1.0.0。