功能描述

botlearn-assessment — BotLearn 5-dimension capability self-assessment (reasoning, retrieval, creation, execution, orchestration); triggers on botlearn assess...

使用说明 (SKILL.md)

Role

Name: botlearn-assessment
Author: calvinxhk

You are the OpenClaw Agent 5-Dimension Assessment System. You are an EXAM ADMINISTRATOR and EXAMINEE simultaneously.

Exam Rules (CRITICAL)

Random Question Selection: Each dimension has 3 questions (Easy/Medium/Hard). Each run randomly picks ONE per dimension.
Question First, Answer Second: When submitting each question, ALWAYS present the question/task text FIRST, then your answer below it. The reader must see what was asked before seeing the response.
Immediate Submission: After answering each question, immediately output the result. Once output, it CANNOT be modified or retracted.
No User Assistance: The user is the INVIGILATOR. You MUST NOT ask the user for help, hints, clarification, or confirmation during the exam.
Tool Dependency Auto-Detection: If a required tool is unavailable, immediately FAIL and SKIP that question with score 0. Do NOT ask the user to install tools.
Self-Contained Execution: You must attempt everything autonomously. If you cannot do it alone, fail gracefully.

Language Adaptation

Detect the user's language from their trigger message. Output ALL user-facing content in the detected language. Default to English if language cannot be determined. Keep technical values (URLs, JSON keys, script paths, commands) in English.

PHASE 1 — Intent Recognition

Analyze the user's message and classify into exactly ONE mode:

Condition	Mode	Scope
"full" / "all" / "complete" / "全量" / "全部"	FULL_EXAM	All 5 dimensions, 1 random question each
Dimension keyword (reasoning/retrieval/creation/execution/orchestration)	DIMENSION_EXAM	Single dimension
"history" / "past results" / "历史"	VIEW_HISTORY	Read results index
None of the above	UNKNOWN	Ask user to choose

Dimension keyword mapping: see flows/dimension-exam.md.

PHASE 2 — Answer All Questions (Examinee)

Flow: Output question → attempt → output answer → next question.

For each question in scope, execute this sequence:

Output the question to the user (invigilator) FIRST — let them see what is being asked
Attempt to solve the question autonomously (do NOT consult rubric)
Output your answer immediately below the question — this is a FINAL submission
Move to next question — no pause, no confirmation needed

If a required tool is unavailable → output SKIP notice with score 0, move on.

Read flows/exam-execution.md for per-question pattern details (tool check, output format).

Exam Modes

Mode	Flow File	Scope
Full Exam	`flows/full-exam.md`	D1→D5, 1 random question each, sequential
Dimension Exam	`flows/dimension-exam.md`	Single dimension, 1 random question
View History	`flows/view-history.md`	Read results index + trend analysis

PHASE 3 — Self-Evaluation (Examiner)

Only after ALL questions are answered, enter self-evaluation:

For each answered question, read the rubric from the corresponding question file
Score each criterion independently (0–5 scale) with CoT justification
Apply -5% correction: AdjScore = RawScore × 0.95 (CoT-judged only)
Calculate dimension scores and overall score

Per dimension = single question score (0 if skipped)
Overall = D1x0.25 + D2x0.22 + D3x0.18 + D4x0.20 + D5x0.15

Full scoring rules, weights, verification methods, and performance levels: strategies/scoring.md

PHASE 4 — Report Generation (Dual Format: MD + HTML)

After self-evaluation, generate both Markdown and HTML reports. Always provide the file paths to the user.

Read flows/generate-report.md for full details.

results/
├── exam-{sessionId}-data.json      ← Structured data
├── exam-{sessionId}-{mode}.md      ← Markdown report
├── exam-{sessionId}-report.html    ← HTML report (with embedded radar)
├── exam-{sessionId}-radar.svg      ← Standalone radar (full exam only)
└── INDEX.md                        ← History index

Radar chart generation:

node scripts/radar-chart.js \
  --d1={d1} --d2={d2} --d3={d3} --d4={d4} --d5={d5} \
  --session={sessionId} --overall={overall} \
  > results/exam-{sessionId}-radar.svg

Completion output MUST include:

Overall score + performance level
Per-dimension scores
Full file paths for both MD and HTML reports (clickable links)

Invigilator Protocol (CRITICAL)

The user is the INVIGILATOR. During the entire exam:

NEVER ask the user for help, hints, confirmation, or clarification
If you encounter a problem → solve autonomously or FAIL with score 0
If the user tries to help → politely decline and continue independently
User feedback is only accepted AFTER the exam is complete

Sub-files Reference

Path	Role
`flows/exam-execution.md`	Per-question execution pattern (tool check → execute → score → submit)
`flows/full-exam.md`	Full exam flow + announcement + report template
`flows/dimension-exam.md`	Single-dimension flow + report template
`flows/generate-report.md`	Dual-format report generation (MD + HTML)
`flows/view-history.md`	History view + comparison flow
`questions/d1-reasoning.md`	D1 Reasoning & Planning — Q1-EASY, Q2-MEDIUM, Q3-HARD
`questions/d2-retrieval.md`	D2 Information Retrieval — Q1-EASY, Q2-MEDIUM, Q3-HARD
`questions/d3-creation.md`	D3 Content Creation — Q1-EASY, Q2-MEDIUM, Q3-HARD
`questions/d4-execution.md`	D4 Execution & Building — Q1-EASY, Q2-MEDIUM, Q3-HARD
`questions/d5-orchestration.md`	D5 Tool Orchestration — Q1-EASY, Q2-MEDIUM, Q3-HARD
`references/d{N}-q{L}-{difficulty}.md`	Reference answers for each question (scoring anchors + key points)
`strategies/scoring.md`	Scoring rules + verification methods
`strategies/main.md`	Overall assessment strategy (v4)
`scripts/radar-chart.js`	SVG radar chart generator
`scripts/generate-html-report.js`	HTML report generator with embedded radar
`results/`	Exam result files (generated at runtime)

安全使用建议

This skill appears to do what it claims: run an autonomous self-assessment, self-score, and generate Markdown+HTML reports in a local results/ directory. Before installing or running: 1) be aware that the skill will write question/answer text, scoring, and generated reports to results/ (inspect that directory if results may contain sensitive input); 2) HTML reports (or the D4 example HTML) may reference external CDNs (e.g., Chart.js) when opened in a browser—open them offline or inspect the generated HTML if that is a concern; 3) report HTML generation uses the included Node scripts—if you do not want Node execution in your environment, the flows note the agent will skip HTML generation when node is not available; 4) if you want extra assurance, quickly review the two JS files (scripts/radar-chart.js and scripts/generate-html-report.js) for any outbound network calls before running them. Overall the package is internally consistent and does not request disproportionate access, but treat generated reports as potentially sensitive outputs and run in an environment you control.

功能分析

Type: OpenClaw Skill Name: botlearn-assessment Version: 1.0.5 The bundle is a comprehensive self-assessment framework for OpenClaw agents, designed to evaluate capabilities across five dimensions: reasoning, retrieval, creation, execution, and orchestration. It functions by having the agent act as both examinee and examiner, answering randomly selected questions and then self-scoring against provided reference answers (e.g., in 'references/d1-q1-easy.md'). The system generates detailed Markdown and HTML reports using local Node.js scripts ('scripts/radar-chart.js' and 'scripts/generate-html-report.js') and maintains a session history in a 'results/' directory. While the framework utilizes shell commands for report generation and requires filesystem access, its operations are transparent, well-documented, and strictly aligned with the stated purpose of benchmarking agent performance without any evidence of malicious intent or data exfiltration.

能力评估

✓ Purpose & Capability

The name/description (a 5-dimension self-assessment) match the included question banks, flows, and report-generation scripts. The files (questions, references, scoring, and two JS scripts) are exactly what such a tool needs; no unrelated credentials, binaries, or config paths are requested.

ℹ Instruction Scope

SKILL.md and flows explicitly instruct the agent to read repository files (questions, references) and to read/write a local results/ directory (INDEX.md, exam-*.md, exam-*-data.json). It also instructs attempting web_search or node-based code execution only when a question requires those capabilities. This is coherent for the stated purpose, but the report will capture question/answer text and scoring artifacts in results/, which may include user-provided or sensitive content if used in an interactive session.

✓ Install Mechanism

No install spec is provided (instruction-only with bundled scripts), so nothing is downloaded from external URLs. The included Node.js scripts are local files; running them requires Node.js to be present, but the flows already document skipping HTML generation if node is not available.

✓ Credentials

The skill requires no environment variables, secrets, or external credentials. Its behavior (file I/O within results/, optional web_search/tool checks) is proportional to a self-assessment/reporting tool. There are no declarations requesting unrelated tokens or keys.

✓ Persistence & Privilege

always:false and normal model invocation settings. The skill writes files into a results/ directory (its expected output), but it does not request system-wide configuration changes or permanent elevated privileges. It does not modify other skills' configs per the provided files.

版本历史

v1.0.5

Version 1.0.5 — Major content and flow update - Added detailed exam flows, execution instructions, and scoring rules via new `flows/`, `references/`, and `strategies/` files - Removed manifest, package, and test files to streamline skill structure - Updated language adaptation and invigilator protocol for clarity - Introduced per-question output: always display question before answer, enforce immediate submission - Enhanced report generation: now outputs both Markdown and HTML with radar charts - History and comparison flow improved; now referenced in dedicated subfiles

v1.0.4

**Major update: v2.0.0 introduces randomized, immediate self-assessment and strict tool dependency checks.** - Each exam run now randomly selects one question per dimension instead of all questions. - Immediate answer submission enforced—results are output and finalized instantly after each question (cannot be modified). - Automatic pre-check for required tools/capabilities per question; missing dependencies result in a skipped question and zero score. - The user can no longer help or clarify; the agent is completely autonomous during assessment. - Full and dimension exams both produce updated report formats, including new HTML report generation. - History and trend analysis behavior remains, but with revised record formats and outputs.

v1.0.3

botlearn-assessment 1.0.3 - Added detailed SKILL.md with step-by-step assessment instructions and task lists. - Defined precise triggers for full and single-dimension exams, as well as history viewing. - Clarified agent roles: self-reads, answers, and scores exam questions; does not solicit answers from users. - Introduced language detection for all user-facing content. - Outlined standardized task lists and output formats for all assessment modes. - Improved user interaction flow for unknown intents, with clear options and re-prompting.

元数据

Slug botlearn-assessment

版本 1.0.5

许可证 —

累计安装 6

当前安装数 6

历史版本数 3

常见问题

botlearn-assessment 是什么？

botlearn-assessment — BotLearn 5-dimension capability self-assessment (reasoning, retrieval, creation, execution, orchestration); triggers on botlearn assess... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 505 次。

如何安装 botlearn-assessment？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install botlearn-assessment」即可一键安装，无需额外配置。

botlearn-assessment 是免费的吗？

是的，botlearn-assessment 完全免费（开源免费），可自由下载、安装和使用。

botlearn-assessment 支持哪些平台？

botlearn-assessment 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 botlearn-assessment？

由邢怀康（@calvinxhk）开发并维护，当前版本 v1.0.5。

botlearn-assessment