Description

Coordinate multiple sub-agents to collaboratively complete long-horizon software engineering tasks using the CAID (Centralized Asynchronous Isolated Delegati...

README (SKILL.md)

CAID Multi-Agent Coordination

Name: Caid Multi Agent
Author: simoncatbot

This skill implements the Centralized Asynchronous Isolated Delegation (CAID) paradigm for coordinating multiple agents working on shared artifacts.

⚠️ CRITICAL WARNINGS FROM PAPER:

Use CAID from the outset — Don't run single-agent first as fallback. Sequential strategy costs nearly 2x with minimal gain.

Physical worktree isolation is mandatory — Soft isolation (instruction-only) degrades performance on complex tasks.

Engineer limits are strict — 2 for PaperBench-style, 4 for Commit0-style, never exceed 8.

Higher cost/runtime trade-off — CAID improves accuracy, not speed. Integration is sequential/test-gated.

Core Principles

Centralized Task Delegation — A manager agent decomposes tasks into dependency-aware units
Asynchronous Execution — Multiple engineer agents work concurrently
Isolated Workspaces — Each agent works in its own isolated branch/worktree
Structured Integration — Progress is merged via git commit/merge with test verification

When to Use This Skill

Use CAID from the outset for:

Long-horizon tasks with multiple interdependent files
Clear dependency structure (imports, test mappings)
Parallelizable work exists
Integration can be verified by executable tests

Don't use as fallback: Running single-agent first then CAID is inefficient (cost/runtime nearly additive, minimal performance gain).

Use single-agent for:

Isolated, single-file changes
No clear parallelization opportunities
Exploratory/research-oriented tasks

Coordination Workflow

0. Manager Pre-Setup (CRITICAL)

Before ANY delegation, the manager must:

Prepare runtime environment
- Ensure dependencies installed
- Set up virtual environment
Organize entry points
- Create main entry files
- Ensure import paths work
Add minimal function stubs
- Empty function definitions so imports don't fail
- Type signatures if available
Commit to main branch
- All engineer branches created from consistent base
- Without this, engineers start from divergent states

# Pre-setup commit
git add .
git commit -m "setup: initial stubs and entry points"
git push origin main

1. Task Analysis & Dependency Graph Creation

Manager's role: Before delegating, analyze the task structure:

Identify atomic units of work (files, functions, modules)
Build a dependency graph: G=(V,E) where edges indicate dependencies
Define Ready(v) ⇔ all dependencies of v are completed
Only delegate tasks that are Ready (all dependencies satisfied)

Commit0-style tasks (clear file structure):

Check import statements to identify file-level dependencies
Collect executable test cases from repository
Examine which files tests exercise
Identify components to implement earlier (upstream dependencies)
Delegate at file level first — only split to function level if file has many unimplemented functions

PaperBench-style tasks (inferred structure):

Read paper to identify main contribution
Infer implementation order from contribution
Use max 2 engineers — manager task is harder, more agents destabilize

Dependency graph construction:

Ready_t(v_j) ⇔ ∀(v_i, v_j) ∈ E, v_i ∈ Completed_t

Only delegate tasks from {v ∈ V | Ready_t(v)}

2. Workspace Isolation Setup

Create PHYSICALLY isolated worktrees (not soft isolation):

# Main branch is the single source of truth
git worktree add ../workspace-engineer-1 \x3Cbranch-name-1>
git worktree add ../workspace-engineer-2 \x3Cbranch-name-2>
# etc.

⚠️ WARNING: Soft isolation (same workspace, instruction-level constraints) degrades performance to below single-agent on PaperBench. Physical git worktree isolation is mandatory.

Key isolation principles:

Each engineer operates in its own git worktree (physical filesystem isolation)
All worktrees are derived from the main branch
Engineers modify files only within their assigned workspace
Restricted files (shared across engineers): __init__.py, config files, global constants — engineers must NOT commit changes to these

3. Dependency-Aware Task Delegation

STRICT Engineer Limits:

Task Type	Max Engineers	Why
PaperBench-style	2	Inferred dependencies; more destabilizes
Commit0-style	4	Clear file structure; test-guided
General SWE	2-4	Balance parallelism vs integration overhead
Absolute max	8	Beyond this, coordination tax exceeds gains

⚠️ Critical: Increasing engineers beyond optimal degrades performance due to integration overhead and conflict resolution costs.

Task prioritization heuristics: Manager should prioritize tasks that:

Enable earlier test execution (expose evaluation signals sooner)
Lie closer to upstream of dependency chain
Are simpler functions before complex ones

Round definition:

One round = complete cycle of delegation → implementation → dependency update

Recommended iteration limits (from paper experiments):

Role	Max Iterations
Manager	50
Each Engineer	80
Total Rounds	~22 (varies by task)

Delegation algorithm:

At round t:
1. Ready_Set = {v ∈ V | Ready_t(v)}  // all dependencies satisfied
2. Select up to N tasks from Ready_Set (N = max parallel engineers above)
3. Apply prioritization heuristics
4. Delegate to available engineers
5. Wait for completion signals
6. Update dependency state after each successful integration

Task assignment JSON format (structured communication — NO free-form dialog):

{
  "task_id": "string",
  "task_description": "string",
  "target_files": ["path/to/file.py"],
  "target_functions": ["function_name"],
  "dependencies": ["task_id_1", "task_id_2"],
  "expected_outcome": "description of success criteria",
  "verification_command": "pytest tests/test_file.py -v",
  "restricted_files": ["src/__init__.py", "src/config.py"],
  "priority": "high|medium|low"
}

Key: All communication uses structured JSON, not free-form dialog. This prevents inter-agent misalignment (primary failure mode in multi-agent systems).

4. Asynchronous Execution Loop

Event loop pattern:

Delegate → Manager assigns tasks to available engineers
Execute → Engineers work concurrently in isolated worktrees
Self-Verify → Engineer runs tests, fixes failures
Complete → Engineer submits commit when ALL tests pass
Integrate → Manager attempts merge to main
Conflict Resolution (if needed) → Responsible engineer resolves
Update → Manager updates dependency graph
Repeat → Continue until all tasks complete or limits reached

Engineer self-verification (MANDATORY before submission):

Run relevant tests that import/reference modified files
If no explicit mapping, run repository's default test command
Any failed test or runtime exception MUST be resolved
Use concrete error logs and tracebacks for iterative refinement
Only submit commit after ALL tests pass

5. Integration via Merge

Merge workflow:

# Manager attempts merge
git checkout main
git merge \x3Cengineer-branch>

# If conflict:
# 1. Engineer who produced conflicting commit is RESPONSIBLE for resolution
# 2. Engineer pulls latest main: git pull origin main
# 3. Resolves conflicts locally
# 4. Re-runs tests to ensure resolution didn't break anything
# 5. Resubmits commit
# 6. Manager retries merge

Main branch is single source of truth throughout execution.

6. Context Management for Manager

To prevent context explosion, manager uses LLMSummarizingCondenser pattern:

Periodically:
1. Summarize prior interaction rounds
2. Preserve structured artifacts:
   - Dependency graph (current state)
   - Completed tasks (with commit hashes)
   - Unresolved errors (with traceback summaries)
3. Discard detailed conversation history
4. Maintain execution traceability without bloat

Compressed execution history format:

{
  "round": 5,
  "completed": ["task-1", "task-2", "task-3"],
  "ready": ["task-4", "task-5"],
  "blocked": ["task-6: waiting for task-5"],
  "active_engineers": 2,
  "main_branch_commits": ["abc123", "def456"],
  "unresolved_errors": []
}

7. Worktree Synchronization & Cleanup

State synchronization when main advances:

# Engineer syncs to latest integrated state
cd ../workspace-engineer-1
git fetch origin
git reset --hard origin/main  # Sync worktree to latest main

Worktree cleanup (after completion or limit reached):

# Remove worktree when engineer finishes or hits iteration limit
git worktree remove ../workspace-engineer-1
rm -rf ../workspace-engineer-1  # Clean up directory

Worktrees are deleted after all assigned tasks are completed or when the engineer reaches the predefined iteration limit.

8. Termination Conditions

Success: All units completed and integrated into main
Failure: Maximum rounds/iterations reached with unresolved tasks
Incomplete: Task considered incomplete if any units remain unresolved

Manager iteration limits (from paper):

Manager: max_iterations=50
Each engineer: max_iterations=80
Total rounds: ~22 (varies by task)

9. Manager Final Review

After the asynchronous loop completes, the manager does a final review before submitting the final product.

Final review checklist:

Verify all tasks from dependency graph are completed
Run full test suite: pytest tests/ -v
Check integration completeness (all commits merged)
Review any unresolved errors or warnings
Validate final state matches expected outcome
Submit final product only after verification

# Manager final verification
git checkout main
pytest tests/ -v                    # Full test suite
python -m mypackage --version       # Smoke test
# Review any integration gaps

Implementation Guidelines

Using OpenClaw Sub-agents

For OpenClaw, the sessions_spawn tool enables parallel agent execution:

Spawn engineer agents:

// For each task in Ready_Set, spawn an engineer
{
  "runtime": "subagent",
  "task": "\x3Ctask specification with context>",
  "agentId": "\x3Cengineer-agent-id>",
  "mode": "run",
  "runTimeoutSeconds": 300
}

Check progress:

// Poll for completion
{
  "action": "list"
}

Worktree Synchronization

When main advances, update worktrees:

# Engineer pulls latest main before continuing
cd ../workspace-engineer-1
git fetch origin
git reset --hard origin/main  # Or rebase

This ensures engineers work from latest integrated state.

Verification Intensity vs Efficiency Trade-off

From paper analysis (Section 4.4):

Strategy	Pass Rate	Runtime	When to Use
Round-Manager Review	60.2%	3689s	Maximum correctness required
Engineer Self-Verification	55.1%	2244s	Default - balanced
Efficiency-Prioritized	54.0%	1909s	Time-critical, acceptable risk

Default: Engineer self-verification without repeated manager review.

Common Pitfalls & Solutions

Pitfall	Solution
Using CAID as fallback after single-agent fails	Use from outset; sequential costs ~2x with minimal gain
Soft isolation (instruction-only)	Mandatory `git worktree` physical isolation
Too many engineers (>4-8)	Strict limits: 2 PaperBench, 4 Commit0, 8 absolute max
Skipping manager pre-setup	Always prepare runtime/stubs/entry points first
Skipping manager final review	Always do final verification before submission
Merge conflicts from concurrent edits	Group dependent files; engineer resolves own conflicts
Not cleaning up worktrees	Delete worktrees after completion/limit reached
Agents develop inconsistent views	Structured JSON only; no free-form dialog
Silent interference between agents	Explicit merge with test verification
Tasks not clearly defined	Build dependency graph before ANY delegation
Integration failures discovered late	Self-verification mandatory before commit
Context explosion	Use LLMSummarizingCondenser pattern
Missing restricted files	Mark `__init__.py`, configs as restricted

Cost/Runtime Expectations

CAID trade-offs (vs single-agent):

Higher API cost — Multiple agents = more LLM calls
Similar or longer wall-clock time — Integration is sequential/test-gated
Substantially higher accuracy — +26.7% PaperBench, +14.3% Commit0

When worth it: Long-horizon shared-artifact tasks where correctness matters more than speed.

Example Workflows

See references/examples.md for concrete implementation examples including:

Commit0-style library implementation
PaperBench-style paper reproduction
Bug fixing (single-file vs multi-file)
Feature addition with API and frontend

References

Paper: "Effective Strategies for Asynchronous Software Engineering Agents" (arXiv:2603.21489v1)
GitHub: https://github.com/JiayiGeng/async-swe-agents
Built on OpenHands agent SDK principles

Usage Guidance

This skill is internally consistent with its goal of coordinating multiple engineer agents. Before installing or using it, ensure: (1) the agent runtime has access to the target repository (local workspace and any git remote credentials needed for git push/pull), (2) the execution environment can create worktrees and virtual environments and run tests (python, pytest, required libs), and (3) the platform account used by the agent has permission to spawn subagents via the sessions_spawn-like API. If you don't want automatic remote pushes, run merges locally or mirror a safe remote. Limit the number of concurrent engineers according to the guidance (2–4 typical; 8 absolute max). If you need clarification about what platform tokens or git credentials will be used, ask the skill author or your platform admin before enabling it.

Capability Analysis

Type: OpenClaw Skill Name: caid-multi-agent Version: 1.2.0 The caid-multi-agent skill bundle implements the Centralized Asynchronous Isolated Delegation (CAID) paradigm for coordinating multiple sub-agents on complex software engineering tasks. The instructions in SKILL.md and examples in references/examples.md provide a robust framework for task decomposition, physical workspace isolation using 'git worktree', and structured integration via git merges. While the skill utilizes high-privilege operations such as filesystem manipulation and sub-agent spawning (sessions_spawn), these actions are strictly aligned with the stated purpose of parallelized development. No evidence of malicious intent, data exfiltration, or prompt injection was found; the bundle emphasizes rigorous self-verification and dependency management to ensure system stability.

Capability Assessment

✓ Purpose & Capability

Name/description (CAID multi-agent coordination) match the instructions: dependency analysis, physical git worktrees, delegation, test-gated integration, and spawning engineer subagents. All requested operations (git worktree, venv, pytest, spawn subagents) are expected for this capability.

ℹ Instruction Scope

SKILL.md gives detailed runtime steps that stay within the domain (repo prep, create worktrees, delegate tasks, run tests, merge). It also includes sample code to call a sessions_spawn API (platform subagent spawn). The instructions assume the agent can run shell commands, access the repository, and call the platform's session API — these are expected but the document does not explicitly list the external privileges (git remote access, platform session API access) that will be used.

✓ Install Mechanism

Instruction-only skill with no install spec or external downloads. It runs common tooling (python venv, pip, pytest, git) but does not install code from arbitrary URLs — low install risk.

ℹ Credentials

The skill declares no required environment variables or credentials, which is reasonable for an instruction-only coordination skill. However, the instructions call git push/pull (remote repository operations) and call sessions_spawn (platform API) — both implicitly require repository credentials and platform runtime permissions. The skill doesn't request or document these, so operators should ensure the environment already has appropriate git auth and platform session permissions before use.

✓ Persistence & Privilege

always:false and no special install behavior. The skill instructs spawning subagents (normal for multi-agent coordination) but does not request permanent system presence or attempt to modify other skills' configs.

Version History

v1.2.0

Add manager final review phase, worktree cleanup instructions, state synchronization commands, specific iteration limits (manager:50, engineer:80, rounds:~22), round definition, and cleanup examples

v1.1.0

Add critical warnings: pre-setup phase, strict engineer limits (2-4), physical worktree isolation requirement, context compression, PaperBench vs Commit0 specific guidance, and 'not fallback' warning

v1.0.0

Initial release of CAID Multi-Agent skill for coordinated multi-agent software engineering. - Enables multiple sub-agents to collaboratively complete complex, long-horizon coding tasks using the Centralized Asynchronous Isolated Delegation (CAID) paradigm. - Implements branch-and-merge workflow with strict git worktree isolation for each agent. - Features dependency-aware task decomposition and parallelizable, structured task delegation. - Integrates automated testing and verification at every merge to main branch. - Provides guidelines on optimal parallelism, agent communication, and conflict resolution strategies.

Metadata

Slug caid-multi-agent

Version 1.2.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 3

Frequently Asked Questions

What is Caid Multi Agent?

Coordinate multiple sub-agents to collaboratively complete long-horizon software engineering tasks using the CAID (Centralized Asynchronous Isolated Delegati... It is an AI Agent Skill for Claude Code / OpenClaw, with 105 downloads so far.

How do I install Caid Multi Agent?

Run "/install caid-multi-agent" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Caid Multi Agent free?

Yes, Caid Multi Agent is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Caid Multi Agent support?

Caid Multi Agent is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Caid Multi Agent?

It is built and maintained by Simon (@simoncatbot); the current version is v1.2.0.

More Skills

Caid Multi Agent