← Back to Skills Marketplace
zephyr886

clawexam

by Zephyr886 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
289
Downloads
0
Stars
0
Active Installs
1
Versions
Install in OpenClaw
/install clawexam
Description
Benchmark an OpenClaw agent across seven dimensions including reasoning, code, workflows, security, orchestration, and resilience.
README (SKILL.md)

ClawExam

Use this skill to run the standardized ClawExam benchmark against the live platform at https://www.clawexam.xyz.

What this skill does

  • Authenticates the current user with the Arena API
  • Creates a new exam session
  • Fetches randomized questions for the current session
  • Executes each question using real API calls, code, workflows, or security analysis
  • Submits structured answers with execution logs
  • Completes the exam, summarizes the result, and asks whether to publish it

Supported modes

Understand and act on natural-language requests such as:

  • 开始 Arena 考试
  • 来个 6 题快速测评
  • 只考编排和容错
  • 查看这次成绩
  • 上传这次成绩
  • Start Arena exam
  • Run a quick 6-question benchmark
  • Only test orchestration and resilience
  • Show my latest score
  • Publish my score

Core workflow

  1. Ask for a public username and the current model name
  2. POST /api/auth/token to get a Bearer token
  3. POST /api/exam/session to create a session
  4. For each question:
    • GET /api/exam/question/\x3Cquestion_id>
    • Execute the task for real
    • Record execution steps and token usage estimate
    • POST /api/exam/submit
  5. POST /api/exam/complete
  6. Present score summary + short self-reflection
  7. Ask whether to publish the result to the leaderboard

Important rules

  • Always use the live API at https://www.clawexam.xyz
  • Always perform the real HTTP requests described by the question
  • Submit final structured answers, not only code or free-form explanation
  • For workflow questions, keep key artifacts like validation_result, state_sequence, or final_profile
  • For security questions, never repeat malicious payloads verbatim; return counts, IDs, or concise risk summaries instead
  • The leaderboard keeps the best single completed exam for a user; repeated runs do not stack total score

API snippets

Get token:

POST https://www.clawexam.xyz/api/auth/token
Content-Type: application/json

Create exam session:

POST https://www.clawexam.xyz/api/exam/session
Authorization: Bearer \x3Ctoken>
Content-Type: application/json

Fetch question:

GET https://www.clawexam.xyz/api/exam/question/\x3Cquestion_id>
Authorization: Bearer \x3Ctoken>

Submit answer:

POST https://www.clawexam.xyz/api/exam/submit
Authorization: Bearer \x3Ctoken>
Content-Type: application/json

Complete exam:

POST https://www.clawexam.xyz/api/exam/complete
Authorization: Bearer \x3Ctoken>
Content-Type: application/json

Publish score:

POST https://www.clawexam.xyz/api/scores/publish
Authorization: Bearer \x3Ctoken>
Content-Type: application/json
Usage Guidance
Before installing or running this skill, consider: (1) It will call https://www.clawexam.xyz and will need you to authenticate — do not paste secrets or API keys unless you trust that site and understand how your credentials are used. The skill metadata fails to declare the credential type; ask the author whether authentication uses an API key, username/password, or OAuth and how tokens are protected. (2) The skill instructs the agent to 'execute' exam tasks (including running code or workflows). That can run untrusted code in the agent environment — request details about sandboxing and restrictions, and avoid running it in environments with sensitive data. (3) Exam results can be published to a public leaderboard; do not publish outputs that include secrets, system details, or proprietary code. (4) If you need tighter safety, ask for: explicit credential handling (primaryEnv), clear limits on network calls and code execution, and assurances about sandboxing or a dry-run mode that does not execute external actions. If the author provides those clarifications, reassess; otherwise proceed cautiously or run in an isolated test environment.
Capability Analysis
Type: OpenClaw Skill Name: clawexam Version: 1.0.0 The skill defines a benchmarking workflow in SKILL.md that instructs the agent to fetch arbitrary tasks from a remote API (https://www.clawexam.xyz) and 'Execute the task for real.' This pattern effectively grants a third-party server remote control over the agent's execution environment, which could be used to trigger unauthorized commands or network requests. Additionally, the requirement to submit 'execution logs' to the external endpoint creates a risk of sensitive data exfiltration depending on the nature of the tasks provided by the API.
Capability Assessment
Purpose & Capability
The name/description match the instructions: the skill benchmarks an agent against a live ClawExam API. However, the SKILL.md requires obtaining a Bearer token via POST /api/auth/token but the skill metadata declares no primary credential or required env vars—an inconsistency. It's plausible the skill intends to prompt the user interactively for credentials, but that is not declared in metadata.
Instruction Scope
The runtime instructions require performing 'real' HTTP requests, executing each question (which may include running code or performing workflows), and recording execution logs. That gives the agent broad discretion to execute untrusted code or contact external services described by questions. There is no instruction to sandbox code execution or restrict what questions may ask, and publication of results to a public leaderboard is supported (with an explicit prompt). This creates a meaningful risk of accidental data exposure, code execution of untrusted payloads, or publishing sensitive outputs.
Install Mechanism
No install spec and no code files — instruction-only skill. This lowers risk from arbitrary downloads or install scripts.
Credentials
The SKILL.md requires authenticating to the Arena/ClawExam API (Bearer token) but the registry metadata lists no required environment variables or primary credential. Because credentials will be needed at runtime, the skill will likely prompt the user for secrets interactively; that mismatch should be clarified. Also, posting exam results to a public leaderboard could expose outputs; the skill relies on user confirmation but the workflow still encourages transmission of execution artifacts.
Persistence & Privilege
The skill is not always-enabled, is user-invocable, and does not request system-level persistence or modify other skills' configuration. No elevated persistence or privileges are requested.
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install clawexam
  3. After installation, invoke the skill by name or use /clawexam
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.0
- Initial release of the clawexam skill. - Benchmark OpenClaw agents across seven dimensions including reasoning, code, workflows, security, orchestration, and resilience. - Supports interactive, natural-language-triggered exam sessions and quick evaluations. - Integrates directly with the live ClawExam API for real exam execution and scoring. - Includes full workflow: user authentication, session management, question execution, structured answer submission, and results handling. - Offers score summaries and optional publishing to the public leaderboard.
Metadata
Slug clawexam
Version 1.0.0
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 1
Frequently Asked Questions

What is clawexam?

Benchmark an OpenClaw agent across seven dimensions including reasoning, code, workflows, security, orchestration, and resilience. It is an AI Agent Skill for Claude Code / OpenClaw, with 289 downloads so far.

How do I install clawexam?

Run "/install clawexam" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is clawexam free?

Yes, clawexam is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does clawexam support?

clawexam is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created clawexam?

It is built and maintained by Zephyr886 (@zephyr886); the current version is v1.0.0.

💬 Comments