功能描述

Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says...

使用说明 (SKILL.md)

autoresearch-pro

Name: Autoresearch
Author: 0xcjl

Overview

Automatically improve any OpenClaw skill, prompt, or article through iterative mutation-testing: small edits → run test cases → score with checklist → keep improvements, discard regressions.

Inspired by Karpathy/autoresearch.

Supports three optimization modes:

Mode	Input	Output
Skill	Path to a skill directory	Improved SKILL.md
Prompt	A prompt text string	Improved prompt
Article	An article/document text	Improved article

Workflow

Step 1 — Identify Mode and Input

Ask the user to confirm:

Mode 1 — Skill: User says "optimize [skill-name]" or provides a skill path
Mode 2 — Prompt: User says "optimize this prompt" or pastes a prompt
Mode 3 — Article: User says "improve this article" or pastes article text

For Skill mode, resolve the skill path to ~/.openclaw/skills/\x3Cskill-name>/SKILL.md. For Prompt/Article mode, keep the text in context (do not write to disk unless needed).

Step 2 — Generate Checklist (10 Questions)

Read the target content first. Then generate 10 diverse, specific yes/no checklist questions relevant to the content type:

For Skill mode (same as before):

#	Dimension	What to Check
1	Description clarity	Is the frontmatter description precise and actionable?
2	Trigger coverage	Does it cover the main real-world use cases?
3	Workflow structure	Are steps clearly sequenced and unambiguous?
4	Error guidance	Does it handle error states and edge cases?
5	Tool usage accuracy	Are tool names and parameters correct for OpenClaw?
6	Example quality	Do examples reflect real usage patterns?
7	Conciseness	Is content free of redundant repetition?
8	Freedom calibration	Is instruction specificity appropriate?
9	Reference quality	Are references and links accurate?
10	Completeness	Are all sections filled with real content?

For Prompt mode (10 tailored questions):

#	Dimension	What to Check
1	Goal clarity	Does the prompt state a clear, specific goal?
2	Role/tone	Is the desired role or tone specified?
3	Input format	Is the input format clearly described?
4	Output format	Is the expected output format specified?
5	Constraints	Are key constraints and boundaries stated?
6	Context sufficiency	Is enough context provided to avoid hallucination?
7	Edge cases	Does it handle ambiguous or edge case inputs?
8	Conciseness	Is it free of redundant or contradictory instructions?
9	Actionability	Are instructions concrete and actionable vs. vague?
10	Completeness	Are all necessary elements for the task present?

For Article mode (10 tailored questions):

#	Dimension	What to Check
1	Title quality	Does the title clearly convey the main value?
2	Opening hook	Does the opening grab attention and set expectations?
3	Logical structure	Are ideas logically organized (not random)?
4	Argument clarity	Are claims supported with evidence or reasoning?
5	Conciseness	Is unnecessary padding or repetition removed?
6	Transition flow	Do paragraphs/sections flow smoothly?
7	Closing strength	Does the conclusion summarize and inspire action?
8	Tone consistency	Is the tone consistent throughout?
9	Readability	Is sentence/paragraph length varied appropriately?
10	Audience match	Does language match the target audience level?

Present the 10 questions, numbered 1-10. Ask the user to select which ones to activate (e.g., "use questions 1, 3, 5, 7"). Default: use all 10 if user doesn't specify.

Step 3 — Prepare Test Cases

Skill mode: Generate 3-5 realistic prompts a user would send when using the skill
Prompt mode: Generate 3-5 test inputs that the prompt would process
Article mode: Generate 3-5 ways the article might be read or consumed

Store test cases in context — do not write to disk.

Step 4 — Run Autoresearch Loop

Loop configuration:

Rounds per batch: 30
Max total rounds: 100
Pause: After every 30 rounds, show summary and ask user to continue or stop
Stop conditions: User says stop, OR 100 rounds completed

Per-round procedure:

Mutate: Make ONE small edit to the target content:
- Skill mode: edit SKILL.md
- Prompt mode: edit the prompt string
- Article mode: edit the article text
Test: For each test case, simulate what output the content would produce.
Score: Apply each active checklist question (0 or 1 per question). Score = (passed / total) × 100.
Decide: If new score ≥ best score → keep the mutation. If lower → revert.
Log: Round number, mutation type, score, keep/revert decision.

Mutation types (pick one per round):

Type	Description
A	Add a constraint rule
B	Strengthen trigger/coverage
C	Add a concrete example
D	Tighten vague language
E	Improve error/edge case handling
F	Remove redundant content
G	Improve transitions
H	Expand a thin section
I	Add cross-reference
J	Adjust degree-of-freedom

Step 5 — Report Results

After each batch (30 rounds):

Batch N (rounds X-Y):
  Best score: XX%
  Mutations kept: N  |  Reverted: N
  Most effective types: [list top 2-3]
Accumulated improvements: [summary]
Continue? (yes/stop)

After full completion:

Original score vs. final score
Top 3 most impactful mutations
Final improved content (inline or diff)
File path (skill mode only)

Mutation Strategy Reference

High-impact, low-risk changes:

Adding explicit constraints where the content is vague
Expanding coverage to cover edge cases
Adding concrete examples to abstract instructions
Tightening soft language ("try to" → "must")

Avoid in one round:

Large rewrites of entire sections
Multiple unrelated changes at once
Changing fundamental scope or purpose

See references/mutation_strategies.md for the full strategy guide.

Mode Selection Quick Reference

User says	Mode
"optimize [skill]" / "autoresearch [skill]"	Skill
"optimize this prompt" / "improve my prompt"	Prompt
"polish this article" / "improve this article"	Article
"optimize this document"	Article

Default to Prompt mode if the input is a text string without a skill path.

安全使用建议

This skill appears internally consistent with its purpose, but it will read and (for Skill mode) modify SKILL.md files under ~/.openclaw/skills and create a .snapshots directory there. Before installing or running it: (1) inspect the included scripts/run_eval.py yourself (it is the only code file) to confirm you are comfortable with its behavior; (2) run the tool on a copy of a skill (not production) to observe changes and scoring behavior; (3) ensure the agent asks for and obtains your explicit confirmation before it writes to any skill path or proceeds beyond the first batch; (4) if you do not want any disk changes, avoid giving it Skill mode access or restrict the agent's filesystem permissions. No network endpoints or credentials are requested by the skill, which reduces exfiltration risk, but filesystem modification is intrinsic to its function — treat snapshots as your first line of rollback and verify them before accepting changes.

功能分析

Type: OpenClaw Skill Name: autoresearch-pro Version: 1.0.0 The skill is designed to programmatically modify other OpenClaw skills by editing their 'SKILL.md' files in the '~/.openclaw/skills/' directory. It uses a Python helper script ('scripts/run_eval.py') to read, write, and manage snapshots of these files during an iterative 'mutation-testing' loop. While the stated intent is optimization (inspired by Karpathy's autoresearch), the capability to overwrite the instructions of other skills constitutes a high-risk behavior that could be used for persistent prompt injection or lateral modification of agent behavior across the system.

能力评估

✓ Purpose & Capability

The skill claims to iteratively improve SKILL.md, prompts, or articles. The provided SKILL.md and the helper script (scripts/run_eval.py) consistently implement that: they read/write SKILL.md, create .snapshots, generate checklists, and score mutations. There are no unrelated environment variables, binaries, or network endpoints required, so requested capabilities are proportional to the described purpose.

ℹ Instruction Scope

The runtime instructions explicitly require reading and (for Skill mode) writing the target SKILL.md at ~/.openclaw/skills/<skill-name>/SKILL.md and creating snapshots in a .snapshots directory. That is coherent for a tool that edits skills, but it does mean the skill will modify user files. Prompt/Article modes state they should avoid writing to disk unless needed; the included script supports file I/O only for Skill mode. User confirmation is described in the workflow (pause after batches), but actual enforcement depends on the agent implementation — the user should expect and approve filesystem changes before running.

✓ Install Mechanism

There is no install spec and no external downloads. This is an instruction-only skill with an optional helper Python script included. No archives or network fetches are performed by the provided files.

✓ Credentials

The skill declares no required environment variables, no credentials, and no config paths other than the target skill directory under the user's home (~/.openclaw/skills). There are no indications of requests for unrelated secrets or cloud credentials.

ℹ Persistence & Privilege

always:false (normal). The helper will create snapshots in ~/.openclaw/skills/<skill>/ .snapshots and will write changes to SKILL.md when operating in Skill mode. This is expected behavior for a mutation-based editor, but it does constitute persistent modification of user skill files — the user should ensure they want this and review snapshots. The skill does not modify other skills' config beyond writing to the target skill directory.

版本历史

v1.0.0

Initial release: skill/prompt/article optimization, inspired by Karpathy autoresearch

元数据

Slug autoresearch-pro

版本 1.0.0

许可证 MIT-0

累计安装 1

当前安装数 1

历史版本数 1

常见问题

Autoresearch 是什么？

Automatically improve OpenClaw skills, prompts, or articles through iterative mutation-testing loops. Inspired by Karpathy's autoresearch. Use when user says... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 133 次。

如何安装 Autoresearch？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install autoresearch-pro」即可一键安装，无需额外配置。

Autoresearch 是免费的吗？

是的，Autoresearch 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Autoresearch 支持哪些平台？

Autoresearch 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（cross-platform）。

谁开发了 Autoresearch？

由 Jialin（@0xcjl）开发并维护，当前版本 v1.0.0。

Autoresearch