← 返回 Skills 市场
is-xins-xiaobai

long-run-harness

作者 小白 · GitHub ↗ · v1.0.0 · MIT-0
cross-platform ⚠ suspicious
57
总下载
0
收藏
0
当前安装
1
版本数
在 OpenClaw 中安装
/install long-run-harness
功能描述
Use when building a Planner→Generator→Evaluator multi-agent harness with the Claude SDK. Triggers: "build a harness", "multi-agent pipeline", "agent loop", "...
使用说明 (SKILL.md)

Long-Running App Harness — SDK Implementation

Produces a runnable harness that orchestrates Claude agents via claude_agent_sdk. You are writing the harness, not running inside it.

Use query() + ClaudeAgentOptions for agentic loops; tool() + create_sdk_mcp_server() for structured output. Never anthropic.Anthropic() directly.

pip install claude-agent-sdk

Output structure:

harness/
  harness.py; config.yaml; config.py; log.py
  agents/ planner.py; generator.py; evaluator.py
  models/ state.py
  prompts/ planner.md; generator.md; evaluator.md

Routing

User Signal Route
"build a harness / pipeline" Start at Phase 1
"add an evaluator" Jump to Phase 4
"add state / handoff" Jump to Phase 5
"looping forever / broken" Check feedback loop termination in Phase 5
"just explain what a harness does" Explain concept, don't write code

Phase 1: Design the Harness

Load: $SKILL_DIR/instructions/planner-questions.md

⚠️ HARD GATE: Ask the design questions. Get answers to 1–3 before writing any code:

  1. What does the harness build? (sets Generator tools + Evaluator rubric)
  2. Python or TypeScript? (default: Python)
  3. Models per agent? (default: all claude-opus-4-7; non-defaults → config.yaml)

Create skeleton:

mkdir -p harness/agents harness/models harness/prompts harness/harness-logs
touch harness/harness.py harness/log.py harness/agents/__init__.py harness/models/__init__.py

config.yaml + config.py — all tunable parameters here; never hardcode in agent files. Load: $SKILL_DIR/instructions/config.md for the full HarnessConfig dataclass.

cfg = HarnessConfig.load(Path(__file__).parent / "config.yaml")
# Always: cfg.agents.generator_model  — never: "claude-opus-4-7"

models/state.py — write first; all other files import from it. Load: $SKILL_DIR/instructions/context-handoff.md (HandoffState, EvalResult, format_handoff_for_prompt). Load: $SKILL_DIR/instructions/sprint-contracts.md (SprintContract + negotiation protocol).

log.py — dual stdout + timestamped file under harness-logs/. Load: $SKILL_DIR/instructions/logging.md for full implementation.

log.setup(PROJECT_DIR, label="run")  # once in main()
logger = log.get()                   # in every agent

Phase 2: Planner Agent

Load: $SKILL_DIR/instructions/planner-questions.md for system prompt template. Load: $SKILL_DIR/instructions/agent-patterns.md for full run_planner implementation.

run_planner(brief, session_id, cfg)(reply, new_session_id). ClaudeAgentOptions(resume=session_id) continues session without resending history.

spec, session_id = "", None
while "SPEC_COMPLETE" not in spec:
    user_input = input("[Planner asks]: ").strip() if session_id else initial_brief
    spec, session_id = run_planner(user_input, session_id, cfg)
SPEC_PATH.write_text(spec.replace("SPEC_COMPLETE", "").strip())

Phase 3: Generator Agent

Load: $SKILL_DIR/instructions/agent-patterns.md for run_generator + self_assess implementations.

def run_generator(
    spec, contract, project_dir,
    handoff=None, strategic_framing=None, cfg=None,
) -> str: ...

ClaudeAgentOptions(
    model=cfg.agents.generator_model,
    allowed_tools=["Write", "Read", "Edit", "Bash", "Glob"],
    cwd=str(project_dir), permission_mode="bypassPermissions",
)

After generation, call self_assess() — catches gaps before the Evaluator via submit_assessment MCP tool. If not confident → extra pass with concerns as strategic_framing.


Phase 4: Evaluator Agent

Load: $SKILL_DIR/instructions/agent-patterns.md for full implementation. Load: $SKILL_DIR/instructions/evaluation-rubrics.md for system prompt + rubric criteria.

Two roles: run_evaluator() (post-generation gate) + review_contract() (pre-sprint criteria review).

# submit_grade schema: contract_results[{id, status, evidence}], rubric_scores{id: 1–5}, feedback
def run_evaluator(spec, contract, app_url, rubric_track="A", cfg=None) -> EvalResult: ...

⚠️ Deterministic verdict: Never trust verdict from the LLM. Recompute in _build_eval_result() from contract_results + rubric_scores using cfg.verdict.* thresholds.


Phase 5: Harness Loop

Load: $SKILL_DIR/instructions/iteration-loop.md for run_sprint, strategic_decision, git_commit.

def main():
    cfg = HarnessConfig.load(Path(__file__).parent / "config.yaml")
    log.setup(PROJECT_DIR, label="run")

def run_sprint(spec, contract, project_dir, handoff=None, cfg=None):
    while iteration \x3C cfg.loop.max_iterations:
        # 1. Generate — try/except; crash is a valid (poor) outcome
        # 2. Self-assess — extra pass if not confident
        # 3. git_commit("wip: sprint N iter I")
        # 4. Evaluate → EvalResult
        # 5a. Pass + iteration \x3C min_iterations → quality-improvement continue
        #     Pass + min_iterations met → git_commit("feat") + return
        # 5b. Fail → strategic_decision() → REFINE or PIVOT → set strategic_framing
    # Exhausted: input() if isatty() else return last result

Git checkpoints (see iteration-loop.md for git_commit() helper):

Event Message
SPEC written feat: generate SPEC.md
Contract negotiated chore: sprint N contract
Each iteration wip: sprint N iteration I
Sprint passes feat: sprint N complete

Setup: pip install claude-agent-sdk && export ANTHROPIC_API_KEY=sk-... Verify: python -c "from agents.planner import run_planner; print('OK')"


Common Mistakes

Mistake Fix
Trusting LLM's verdict field Recompute in _build_eval_result() from contract_results + rubric_scores
Hardcoding model names Use cfg.agents.generator_model — never a string literal
Not calling handoff.save() before Evaluator On crash, Evaluator result is lost
Using input() in CI Guard with sys.stdin.isatty() first
Accumulating messages across sprints Each sprint is a fresh query() call — no cross-sprint history
Marking completed_features from Generator claim Only promote after Evaluator PASS verdict

When to Simplify

Component Remove / simplify when
Planner agent User provides SPEC directly
Contract negotiation Human has strong opinions; use config-file mode
Generator self-assessment Evaluator consistently passes first attempt
max_iterations → 3 Correctness-only task, no quality/aesthetic goal
min_iterations → 1 Early passes are always good enough
Refine/pivot strategic_decision Single sprint or correctness task
HandoffState Sprint fits in one context window
Evaluator Task within Generator's reliable baseline
安全使用建议
Use this skill only if you intend to create an autonomous code-writing harness. Before running the generated harness on real work, pin the SDK dependency, run it in a sandbox or disposable repo, change bypassed permissions to approval-based permissions where possible, restrict Bash, and review handoff files and logs for secrets or misleading instructions.
功能分析
Type: OpenClaw Skill Name: long-run-harness Version: 1.0.0 The skill bundle implements a multi-agent orchestration harness that utilizes high-risk capabilities, specifically arbitrary shell execution via the 'Bash' tool and explicit permission bypassing ('bypassPermissions') in SKILL.md and agent-patterns.md. While these features are plausibly required for the stated purpose of autonomous application development, they create a significant attack surface for remote code execution. No evidence of intentional malice, data exfiltration, or unauthorized persistence was found.
能力标签
cryptorequires-sensitive-credentials
能力评估
Purpose & Capability
The stated purpose—generating a Planner→Generator→Evaluator app-building harness—matches the multi-agent SDK code, loops, evaluator, handoff state, and logging instructions. No hidden exfiltration endpoint or destructive intent is evident in the provided artifacts.
Instruction Scope
The generated Generator agent is instructed to use Write, Read, Edit, Bash, and Glob with permission_mode="bypassPermissions". That is purpose-aligned for an app-building harness, but it removes normal per-action review for high-impact local actions.
Install Mechanism
The skill is instruction-only and tells the user to install claude-agent-sdk with an unpinned pip command. This is central to the purpose, but users should pin and verify the dependency.
Credentials
The generated harness can repeatedly modify project files, run shell commands, and make git commits inside a user-supplied project directory. This is powerful enough that it should be run only in a sandbox or disposable repository unless carefully reviewed.
Persistence & Privilege
The skill intentionally creates logs, handoff JSON files, session resumes, and bounded long-running loops. These are disclosed and purpose-aligned, but they persist model-generated context and tool inputs that users should inspect and sanitize.
如何使用
  1. 确保已安装 OpenClaw(本地或 Docker 部署)
  2. 在对话框中输入安装命令:/install long-run-harness
  3. 安装完成后,直接呼叫该 Skill 的名称或使用 /long-run-harness 触发
  4. 根据 Skill 的参数说明提供必要输入,即可获得结构化输出
版本历史
v1.0.0
Initial release of the skill for building multi-agent Claude SDK harnesses. - Guides users through designing and implementing a Planner→Generator→Evaluator orchestrator using `claude-agent-sdk`. - Enforces best practices: config-driven parameters, explicit agent looping, and robust evaluation logic. - Provides detailed, phase-based instructions for harness structure, agent implementation, and iteration management. - Highlights common mistakes and when to simplify the harness.
元数据
Slug long-run-harness
版本 1.0.0
许可证 MIT-0
累计安装 0
当前安装数 0
历史版本数 1
常见问题

long-run-harness 是什么?

Use when building a Planner→Generator→Evaluator multi-agent harness with the Claude SDK. Triggers: "build a harness", "multi-agent pipeline", "agent loop", "... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件,目前累计下载 57 次。

如何安装 long-run-harness?

在 OpenClaw 或 Claude Code 对话框中运行命令「/install long-run-harness」即可一键安装,无需额外配置。

long-run-harness 是免费的吗?

是的,long-run-harness 完全免费,采用 MIT-0 许可证,可自由下载、安装和使用。

long-run-harness 支持哪些平台?

long-run-harness 跨平台运行,可在任意部署了 OpenClaw / Claude Code 的环境中使用(cross-platform)。

谁开发了 long-run-harness?

由 小白(@is-xins-xiaobai)开发并维护,当前版本 v1.0.0。

💬 留言讨论