Description

Train autonomous OpenClaw AI agents through LLM-guided curriculum design and multi-turn dialogue evaluation. Use this skill whenever the user wants to train,...

README (SKILL.md)

ClawSergeant: Boosting OpenClaw Agents from AI Feedback

Name: ClawSergeant
Author: myismyname

ClawSergeant trains OpenClaw agents through a structured, LLM-driven pipeline. A Trainer LLM designs curriculum, generates training tasks, and adapts its teaching dynamically based on the agent's responses. A separate Evaluator LLM objectively scores each response, creating a feedback loop that drives iterative improvement.

Architecture Overview

User Intent ──────────────────────→ LLM (Curriculum Designer)
                                          ↓
                                   Curriculum JSON (stages, tasks, criteria)
                                          ↓
Training Session Loop:
    Trainer LLM → crafts message → openclaw CLI → Claw Agent → reply
                                                      ↓
                                          Evaluator LLM → score + feedback
                                                      ↓
                              record to .claw_sergeant_accumulated_lessons/ ←──┘
                                          ↓
                                  (if failed) → Trainer LLM retries with feedback
                                          ↓
                                  (if stage passed) → stage summary for memory consolidation
                                          ↓
                    [Curriculum Pattern] → record to .claw_sergeant_accumulated_lessons/

Training Pipeline

Phase 1: Curriculum Design

The user's training intent is passed directly as input. The LLM generates a multi-stage curriculum as structured JSON based on this intent. The user reviews and approves the curriculum before training begins.

Each curriculum contains:

Title and overview of the training program
Target persona describing the ideal agent after training
3–5 stages, each with:
- Name, description, and learning objectives
- 2–4 training tasks with scenario descriptions and expected behaviors
- Evaluation criteria with passing standards

Phase 2: Training Execution

For each stage and task, the system runs a dialogue loop:

Trainer LLM generates a task message tailored to the agent (it never sees hardcoded prompts — everything is dynamically composed)
Message is sent to the Claw Agent via openclaw agent CLI
Agent's reply is captured and fed back to the Trainer's conversation context
Evaluator LLM scores the reply (1–10) and reports strengths, weaknesses, and improvement suggestions
If the task is not passed and retries remain, the Trainer generates a follow-up message incorporating the evaluation feedback
After a stage passes, the agent receives a summary prompt to internalize lessons learned

Environment Setup

Create a .env file in the project root with:

LLM_API_KEY=\x3Cyour-api-key>          # Required: API key for the LLM
LLM_BASE_URL=https://api.openai.com/v1  # Optional: OpenAI-compatible endpoint
LLM_MODEL=gpt-4o                    # Optional: model identifier
CLAW_RECIPIENT=+15555550123         # Required: target agent's address

Running the Training

Full Training Session

python main.py "An efficient, rigorous programming assistant"

The training intent is passed as a command-line argument. ClawSergeant designs a curriculum, presents it for approval, and runs the training session automatically. Results are saved to training_results.json.

Phase-by-Phase Testing

Use test_phases.py to verify each component independently before running a full session:

python test_phases.py 1    # Verify LLM API connectivity
python test_phases.py 2    # Test curriculum generation
python test_phases.py 3    # Test Claw agent communication
python test_phases.py 4    # Run a single-task training round
python test_phases.py all  # Run all phases sequentially

Always start with phase 1 to confirm the LLM connection works, then progress through subsequent phases.

Configuration

All training parameters are centralized in config.py:

Parameter	Default	Purpose
`STAGE_COUNT_MIN` / `MAX`	3 / 5	Number of training stages
`TASKS_PER_STAGE_MIN` / `MAX`	2 / 4	Tasks per stage
`CURRICULUM_TEMPERATURE`	0.4	LLM temperature for curriculum design
`TRAINER_TEMPERATURE`	0.7	LLM temperature for training messages
`EVALUATOR_TEMPERATURE`	0.2	LLM temperature for evaluation (low = strict)
`MAX_ATTEMPTS_PER_TASK`	2	Retries per task before moving on
`STAGE_PASS_THRESHOLD`	0.6	Fraction of tasks needed to pass a stage

Adjust STAGE_PASS_THRESHOLD higher (e.g., 0.8) for stricter training, or lower temperatures for more deterministic evaluations.

Key Components

File	Role
`main.py`	Entry point — orchestrates curriculum design → approval → training execution
`trainer.py`	Training session controller — manages dialogue loop and captures per-task/stage learnings
`curriculum.py`	Curriculum data model and LLM-based generation
`claw_agent.py`	Wraps `openclaw agent` CLI for agent communication
`llm_handler.py`	Async LLM client with conversation history management
`learning_logger.py`	Structured experience logger — records training insights and writes to OpenClaw MEMORY.md
`config.py`	Centralized training parameters
`test_phases.py`	Step-by-step pipeline verification

Training Results

After a session completes, training_results.json contains:

{
  "curriculum": {
    "title": "...",
    "overview": "...",
    "target_persona": "...",
    "stages_total": 4,
    "stages_passed": 3
  },
  "stage_reports": [
    {
      "stage_id": 1,
      "stage_name": "...",
      "passed": true,
      "overall_feedback": "...",
      "tasks": [
        {
          "task_id": "1.1",
          "passed": true,
          "score": 8,
          "strengths": ["..."],
          "weaknesses": ["..."],
          "feedback": "..."
        }
      ]
    }
  ]
}

Experience Recording

Training experiences are automatically recorded throughout the session. Every task evaluation, stage result, and infrastructure error is logged to .claw_sergeant_accumulated_lessons/ as structured markdown entries for future reference.

After the session completes, a summary is written to ~/.openclaw/workspace/MEMORY.md containing the training timestamp, curriculum details, stage pass/fail results, and a pointer to the full logs. This allows the Claw agent to reference its training history in future sessions. If the OpenClaw workspace is not found, this step is silently skipped.

Troubleshooting

LLM connection fails: Run python test_phases.py 1 to verify API key and endpoint. Check LLM_BASE_URL points to a valid OpenAI-compatible API.
Claw agent timeout: The default timeout is 120 seconds. If the agent is slow to respond, check network connectivity and the openclaw CLI installation.
Curriculum has no stages: The LLM may have returned malformed JSON. Try lowering CURRICULUM_TEMPERATURE or switching to a more capable model.
All tasks fail: Review evaluation criteria — they may be too strict. Lower STAGE_PASS_THRESHOLD or increase MAX_ATTEMPTS_PER_TASK in config.py.

Dependencies

Python 3.11+
httpx — async HTTP client for LLM API calls
loguru — structured logging
python-dotenv — environment variable management
openclaw CLI — must be installed and accessible in PATH

Usage Guidance

Key points to check before installing or running: - Do not trust the registry metadata alone: this package actually requires an LLM API key (LLM_API_KEY) and a target address (CLAW_RECIPIENT) and expects the openclaw CLI to be installed and usable. The registry incorrectly lists no env vars/binaries. - A referenced module (learning_logger.py) is imported and used but is not present in the provided file list. Running the skill as-is will likely fail; request the missing file or a corrected package from the author. - The skill will persist conversation history and training outputs locally (training_results.json and .claw_sergeant_accumulated_lessons/) and attempts to write to the OpenClaw workspace MEMORY.md. Review those outputs for sensitive data and consider running initial tests in an isolated environment. - Use a least-privilege LLM API key (scoped, rate-limited) and a non-production agent recipient when testing. Inspect trainer/evaluator prompts (they instruct the trainee to 'internalize' lessons) to ensure they won't cause undesired persistent changes to the target agent. - If you cannot verify the missing module or correct the metadata, treat this skill as untrusted and avoid running it with production credentials or against critical agents/workspaces.

Capability Analysis

Type: OpenClaw Skill Name: clawsergeant Version: 1.0.0 ClawSergeant is a legitimate training orchestration framework designed to improve OpenClaw agents through LLM-guided curricula and feedback loops. It uses a 'Trainer' LLM to dynamically generate training tasks and an 'Evaluator' LLM to score responses, communicating with the target agent via the `openclaw agent` CLI (found in `claw_agent.py`). The tool records training progress and 'lessons learned' to the agent's workspace memory (`~/.openclaw/workspace/MEMORY.md`), which is consistent with its stated purpose of autonomous agent improvement. No evidence of data exfiltration, malicious subprocess execution, or harmful prompt injection was found; the code is well-structured and follows standard patterns for the OpenClaw ecosystem.

Capability Assessment

⚠ Purpose & Capability

The skill's stated purpose is training OpenClaw agents, which legitimately requires an LLM API key and a target agent address and uses the openclaw CLI. However, the registry metadata claims no required environment variables or binaries, which is inconsistent with the SKILL.md and code (main.py and test_phases.py require LLM_API_KEY and CLAW_RECIPIENT and the code invokes the 'openclaw' CLI). The mismatch between declared requirements and actual code is a red flag.

ℹ Instruction Scope

SKILL.md and the code instruct the agent to: generate curricula, call an external LLM endpoint, send messages to an OpenClaw agent via the openclaw CLI, evaluate replies, and persist lessons to a local lessons directory and to the OpenClaw workspace MEMORY.md. These behaviors align with the training purpose, but they involve persistent storage of potentially sensitive conversation content and instruct the trainee agent to 'internalize' lessons (i.e., change future behavior). Also, the code imports LearningLogger (learning_logger.py) and references writing to OpenClaw MEMORY.md but that file is not present in the provided manifest — the runtime will fail or behave unexpectedly unless that module exists.

✓ Install Mechanism

No install spec is provided and dependencies are standard Python libraries (httpx, loguru, python-dotenv) listed in requirements.txt. There are no downloads from arbitrary URLs or archive extraction. Installation risk is low provided dependencies are installed from PyPI.

⚠ Credentials

The skill actually requires LLM_API_KEY (and optional LLM_BASE_URL/LLM_MODEL) and CLAW_RECIPIENT, which are proportionate to its LLM calls and agent-targeting. However, the registry metadata lists no required env vars, so what's declared does not match what the code and SKILL.md require. The skill asks users to put secrets into a .env file; ensure you use a scoped API key and understand that training data and agent replies will be stored locally and possibly written into the OpenClaw workspace.

ℹ Persistence & Privilege

The skill writes training results (training_results.json) and accumulates lessons under .claw_sergeant_accumulated_lessons/ and attempts to write a summary to the OpenClaw workspace MEMORY.md. always:false (no forced global presence). These file writes are expected for a trainer, but they create persistent artifacts containing conversations and evaluations — consider privacy of those artifacts. The skill also autonomously calls an external LLM and can send messages to another agent (the trainee), which increases blast radius if misused; this is not by itself disqualifying but worth noting.

Version History

v1.0.0

ClawSergeant skill v1.0.0 – Initial release for LLM-guided OpenClaw agent training and evaluation. - Enables structured OpenClaw agent training via curriculum designed by an LLM. - Implements a multi-stage training pipeline with automated feedback and iterative improvement loops. - Includes tools for curriculum approval, training results logging, and experience recording. - Provides detailed configuration options and per-phase testing to verify setup. - Logs training insights and outcomes for agent memory and future development.

Metadata

Slug clawsergeant

Version 1.0.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is ClawSergeant?

Train autonomous OpenClaw AI agents through LLM-guided curriculum design and multi-turn dialogue evaluation. Use this skill whenever the user wants to train,... It is an AI Agent Skill for Claude Code / OpenClaw, with 204 downloads so far.

How do I install ClawSergeant?

Run "/install clawsergeant" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is ClawSergeant free?

Yes, ClawSergeant is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does ClawSergeant support?

ClawSergeant is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created ClawSergeant?

It is built and maintained by M. Y. (@myismyname); the current version is v1.0.0.

More Skills

ClawSergeant