功能描述

Parallel autonomous ML research agents with a Director, git worktrees for per-agent experiment branches, a Skills library for validated technique reuse, a Sy...

使用说明 (SKILL.md)

Litmus — Parallel Autonomous ML Research Agents

Name: Litmus
Author: kuberwastaken

Litmus spawns multiple OpenClaw subagents that experiment on your GPU overnight. Each runs on its own git branch in a shared lab repository — every experiment is a commit, agents can read each other's code, cherry-pick breakthroughs, and build on the global best at any time.

Validated techniques accumulate in a Skills library (~/.litmus/shared/skills/). A Synthesizer runs at 04:00 to distill collective knowledge into skills and write a research agenda for the next day. A Director runs every 2 hours to steer workers, trigger Compass Resets on stagnation, and orchestrate cross-agent knowledge transfer.

What makes it more than autoresearch:

Git worktrees: agents share one repo, each on their own branch — full experiment history, cherry-pick, and cross-agent code inspection via git -C ~/.litmus/repo log --all
Skills library: validated techniques persist and compound — agents don't re-discover wins
Synthesizer: distills all overnight notes into reusable skills and a research agenda
Compass Reset: Director detects stagnation and forces structured pivots using the skills gap
Two-phase experiment budget: quick 90-second check before committing to a full run
Structured attempt records: JSON per experiment in shared/attempts/ for rich analytics
Leisure mode (03:00–06:00): workers read papers, write moonshot hypotheses, identify gaps
Morning digest: research narrative delivered to your chat at 08:00

Everything is a native OpenClaw subagent. No external processes, no PID files.

First-Time Setup

Recommended — ask your OpenClaw agent (runs a guided onboarding conversation):

"Install https://clawhub.ai/kuberwastaken/litmus and set it up for my machine"

Full onboarding instructions: {baseDir}/references/onboarding.md — read that file first.

Or manually:

git clone https://github.com/kuberwastaken/litmus ~/.litmus
bash ~/.litmus/scripts/setup.sh

Clones Karpathy's training harness, builds the shared lab git repo at ~/.litmus/repo/, installs Python deps via uv, downloads ~1 GB of training data. Wait for it to finish.

Starting Research

1 — Prepare workspaces (creates git worktrees)

bash {baseDir}/scripts/prepare-agents.sh --agents 4 --templates architecture,optimizer,general,general

Creates git worktrees under ~/.litmus/agents/, each on its own branch in ~/.litmus/repo/. The shared lab git repo means every agent's experiments are immediately visible to all others:

git -C ~/.litmus/repo log --all --oneline --graph

2 — Spawn research subagents

sessions_spawn
  task: "Read program.md in your current directory and run the research loop forever."
  runtime: "subagent"
  mode: "session"
  agentId: "litmus-worker-arch-1"
  cwd: "~/.litmus/agents/arch-1"

Repeat for each agent, then:

sessions_yield message: "Research agents running. I'll notify you on new discoveries."

Templates: architecture · optimizer · regularization · general Full template details: {baseDir}/references/templates/

3 — Start the Director Layer

bash {baseDir}/scripts/setup-cron.sh --timezone "Your/Timezone"

Registers 6 cron jobs:

Cron	Default schedule	Role
`litmus-director`	Every 2h during research hours	Reviews results, steers workers, Compass Reset on stagnation
`litmus-leisure`	03:00 daily	Switches workers to paper-reading / creative thinking mode
`litmus-synthesizer`	04:00 daily	Distills notes into skills library, writes research agenda
`litmus-dawn`	06:00 daily	Wakes workers, queues synthesizer's priority experiments
`litmus-watchdog`	Every 30 min	Liveness check, escape mode on zero improvements
`litmus-digest`	08:00 daily	Morning research narrative delivered to your chat

All times are configurable during onboarding — the setup agent pitches defaults and asks what you'd like to change. Common presets: night owl (01:00/02:00/04:00/07:00), early bird (23:00/00:30/02:00/05:30), intensive (1h director). Pass custom times to scripts/setup-cron.sh with --leisure-start, --synthesizer-time, --dawn-time, --digest-time, --director-hours, --watchdog-minutes.

Managing Agents

Status (experiment counts, best val_bpb, git tree):

bash {baseDir}/scripts/status.sh

Leaderboard (cross-agent, from shared/attempts/ JSON):

bash {baseDir}/scripts/results.sh --top 10
bash {baseDir}/scripts/results.sh --agent arch-1  # single agent

Full lab git history (all agents' experiments as a tree):

git -C ~/.litmus/repo log --all --oneline --graph

Inspect any experiment:

git -C ~/.litmus/repo show \x3Ccommit-hash>  # see what changed
cat ~/.litmus/shared/attempts/\x3Chash>.json  # see the metrics

Steer (redirect mid-run, no restart):

subagents action: "steer"  target: "litmus-worker-arch-1"
  message: "Stop refining depth. Checkout the best commit from opt-2 and combine their LR with DEPTH=10."

Stop:

subagents action: "kill"  target: "all"

What Agents Write Overnight

Path	Contents
`~/.litmus/shared/attempts/\x3Chash>.json`	Structured record for every experiment (agent, val_bpb, status, title)
`~/.litmus/shared/skills/\x3Cname>.md`	Validated reusable techniques with YAML frontmatter
`~/.litmus/shared/notes/discoveries/`	Per-improvement discovery notes
`~/.litmus/shared/notes/anomalies/`	Unexpected result notes
`~/.litmus/shared/notes/moonshots/`	Speculative hypotheses from leisure
`~/.litmus/shared/notes/synthesis/`	Synthesizer's research agenda and combination matrix
`~/.litmus/shared/discoveries.md`	Cross-agent knowledge base (flat, for quick reading)
`~/.litmus/shared/midnight-reflections.md`	Leisure agent's nightly narrative
`~/.litmus/repo/` (git)	All experiment commits across all agents on their branches

Reference Files

{baseDir}/references/onboarding.md — first-time setup conversation
{baseDir}/references/program.md — worker agent loop (git-aware, skills-reading, two-phase budget)
{baseDir}/references/director.md — Director cron (Compass Reset, cross-pollination)
{baseDir}/references/leisure.md — Leisure mode (paper reading, structured notes, skill extraction)
{baseDir}/references/synthesizer.md — Synthesizer cron (knowledge distillation, skills library)
{baseDir}/references/dawn.md — Dawn cron (wake workers, queue experiments)
{baseDir}/references/watchdog.md — Watchdog cron (liveness, escape mode)
{baseDir}/references/digest.md — Morning digest (research narrative)
{baseDir}/references/templates/ — Research focus templates
{baseDir}/references/clawrxiv.md — ClawRxiv integration (optional auto-publishing)

安全使用建议

This skill is internally consistent with its purpose, but it performs powerful local and autonomous operations. Before installing or running it: 1) Review the scripts (setup.sh, setup-cron.sh, prepare-agents.sh, and the referenced synthesizer/leisure scripts) yourself to ensure you understand what will run. 2) Run setup and experiments in an isolated environment (VM, container, or non-production machine) — the agents will clone code, install Python packages, create git branches, and run arbitrary training code on your machine. 3) Pay attention to the 'curl | sh' uv installation referenced in INSTALL.md and consider installing uv manually from a trusted source instead. 4) If you do not want remote publishing, disable ClawRxiv publishing (do not populate ~/.litmus/config.json with a clawrxiv api key and leave publishing flags off). 5) If you prefer to avoid autonomous scheduled runs, skip the cron registration step and run agent commands manually. 6) Verify you trust the upstream autoresearch code (karpathy/autoresearch) that will be cloned and executed. If any of these points worry you, do not run setup.sh on machines with sensitive data or shared user access.

功能分析

Type: OpenClaw Skill Name: litmus Version: 1.1.1 Litmus is a sophisticated framework for autonomous ML research that orchestrates multiple subagents to perform experiments, track results via git, and synthesize knowledge. It utilizes shell scripts for setup (setup.sh), agent preparation (prepare-agents.sh), and cron management (setup-cron.sh) to implement a 'circadian rhythm' for research tasks. While it possesses broad capabilities including network access (for Arxiv and the optional ClawRxiv publishing platform) and subagent steering, these actions are transparently documented and strictly aligned with the goal of optimizing ML models. No indicators of malicious behavior, such as secret theft or unauthorized persistence, were identified.

能力评估

✓ Purpose & Capability

Name/description (parallel autonomous ML agents, git worktrees, skills library, synthesizer, Director) match the declared requirements and included scripts: uv, git, python3, cron setup, repo cloning, and per-agent worktrees under ~/.litmus. The optional CLAWRXIV_API_KEY relates to the ClawRxiv publishing feature and is consistent with the docs.

ℹ Instruction Scope

SKILL.md and scripts instruct the agent to clone the autoresearch harness, create a shared lab git repo and worktrees under ~/.litmus, install Python deps via uv, download ~1GB data, register cron jobs (via OpenClaw cron tool), spawn native subagents via sessions_spawn, and read/write structured state in ~/.litmus/shared/. All file and runtime actions are confined to the declared configPath (~/.litmus/) but the instructions do grant autonomous agents the ability to modify code in the shared repo and run experiments (i.e., execute arbitrary training code changes).

ℹ Install Mechanism

There is no packaged install spec; setup.sh clones GitHub (karpathy/autoresearch) and runs 'uv sync' to install Python dependencies. INSTALL.md references installing uv via a curl | sh from astral.sh (remote install script). Cloning from GitHub is expected for this purpose, but the remote 'curl | sh' pattern and uv installing packages are higher-risk operations — expected for a research harness but worth reviewing before running.

✓ Credentials

The skill declares no required environment variables and only an optional CLAWRXIV_API_KEY for the ClawRxiv publishing integration. No unrelated credentials or excessive env requirements are requested. Runtime behavior reads/writes only to ~/.litmus/ paths declared in metadata.

ℹ Persistence & Privilege

always:false (normal). The skill instructs registering multiple cron jobs that schedule autonomous OpenClaw agent turns (Director, Synthesizer, Watchdog, etc.). Autonomous invocation and scheduled jobs are expected for this functionality but increase the blast radius — these scheduled jobs will run without further user interaction unless you choose not to register them.

版本历史

v1.1.1

add stagnation, confidence tracking, compass reset, clawxriv and updated docs

v1.1.0

- Major update: litmus 1.1.0 introduces parallel autonomous ML research agents with advanced orchestration and collective knowledge distillation. - New Skills library for validated technique reuse and cross-agent knowledge transfer. - Adds a Synthesizer to distill collective findings overnight and generate daily research agendas. - Implements a Director layer to steer agents, detect stagnation, and trigger Compass Resets. - Incorporates circadian rhythm scheduling, including leisure hours for paper reading and creative thinking. - Provides structured experiment tracking with per-attempt JSON records and a morning digest summary.

元数据

Slug litmus

版本 1.1.1

许可证 MIT-0

累计安装 0

当前安装数 0

历史版本数 2

常见问题

Litmus 是什么？

Parallel autonomous ML research agents with a Director, git worktrees for per-agent experiment branches, a Skills library for validated technique reuse, a Sy... 它是一个面向 Claude Code / OpenClaw 的 AI Agent Skill 插件，目前累计下载 150 次。

如何安装 Litmus？

在 OpenClaw 或 Claude Code 对话框中运行命令「/install litmus」即可一键安装，无需额外配置。

Litmus 是免费的吗？

是的，Litmus 完全免费，采用 MIT-0 许可证，可自由下载、安装和使用。

Litmus 支持哪些平台？

Litmus 跨平台运行，可在任意部署了 OpenClaw / Claude Code 的环境中使用（linux, darwin）。

谁开发了 Litmus？

由 Kuber Mehta（@kuberwastaken）开发并维护，当前版本 v1.1.1。

Litmus