← Back to Skills Marketplace
reikys

Agent Failure Loop

by reikys · GitHub ↗ · v1.0.1 · MIT-0
cross-platform ✓ Security Clean
140
Downloads
0
Stars
0
Active Installs
2
Versions
Install in OpenClaw
/install agent-failure-loop
Description
An end-to-end self-improvement loop that automatically detects agent failures, classifies them, tracks recurrence, auto-generates rules, and promotes them to...
README (SKILL.md)

agent-failure-loop

If the same mistake repeats three times, a rule is automatically created.

Agents lose their memory when a session ends. They make the same mistake yesterday, today, and tomorrow. This skill builds an end-to-end self-improvement loop that automatically detects → classifies → tracks → promotes failures into rules.


Table of Contents

  1. Why Do Agents Repeat the Same Mistakes?
  2. Architecture
  3. 5-Layer Pipeline
  4. Failure Type Classification
  5. Recording Format
  6. Promotion Conditions and Logic
  7. Installation
  8. Quick Start (5 Minutes)
  9. Cron Integration
  10. Before/After Demo
  11. Comparison with Competing Skills
  12. Cross-Platform Configuration
  13. Script Reference
  14. FAQ

Why Do Agents Repeat the Same Mistakes?

Structural limitations of AI agents:

Problem Cause Result
No memory between sessions Context window is limited to in-session Yesterday's failure is repeated today
No failure records Logs accumulate without distinguishing success/failure Pattern detection impossible
No learning feedback Humans repeat the same corrections Human fatigue increases
No guardrails Past lessons are not referenced on retry Agent falls into the same trap again

Problems with existing approaches:

  • Manual rule addition: Humans manually write rules in AGENTS.md → tedious and frequently missed
  • Conversation learning (self-improving-agent): Analyzes only conversation patterns → cannot detect execution failures → no enforcement
  • Python self-improvement (actual-self-improvement): Implementation exists but → no cron integration → no auto-promotion → ultimately manual

agent-failure-loop bridges all these gaps:

Failure occurs → Immediate recording → Batch analysis → 3x repeat detection → Auto rule promotion → Pre-task lookup
     ↑                                                                              |
     └──────────────────── Guardrail prevents recurrence ───────────────────────────┘

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                    agent-failure-loop Architecture                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Layer 4: GUARDRAIL ─────────────────────────────────────────────   │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  Before starting a task → failure-matcher.py → query        │   │
│  │  similar failures                                           │   │
│  │  "Previously failed 3 times on this task. Cause: X.         │   │
│  │   Lesson: Y"                                                │   │
│  └────────────────────────────┬────────────────────────────────┘   │
│                               │ query                               │
│  Layer 3: PATTERN + PROMOTE ──┼──────────────────────────────────   │
│  ┌────────────────────────────┴────────────────────────────────┐   │
│  │  auto-promote.py                                            │   │
│  │  .learnings/promotable.json → 3+ occurrences → AGENTS.md   │   │
│  │  ┌──────────┐   ┌───────────┐   ┌────────────────────────┐ │   │
│  │  │ Pattern  │──▶│ ≥3 check  │──▶│ Insert rule to target  │ │   │
│  │  │ Detection│   │           │   │ (AGENTS/CLAUDE/cursor)  │ │   │
│  │  └──────────┘   └───────────┘   └────────────────────────┘ │   │
│  └────────────────────────────┬────────────────────────────────┘   │
│                               │ input                               │
│  Layer 2: STRUCTURED ANALYSIS ┼──────────────────────────────────   │
│  ┌────────────────────────────┴────────────────────────────────┐   │
│  │  sync-learnings.py                                          │   │
│  │  failures/*.md → parse → group → .learnings/               │   │
│  │  ┌──────────┐   ┌──────────┐   ┌─────────────────────────┐ │   │
│  │  │  Parse   │──▶│  Group   │──▶│ summary.json            │ │   │
│  │  │  entries │   │ patterns │   │ repeated-patterns.md    │ │   │
│  │  └──────────┘   └──────────┘   │ by-type/*.md            │ │   │
│  │                                 │ promotable.json         │ │   │
│  │                                 └─────────────────────────┘ │   │
│  └────────────────────────────┬────────────────────────────────┘   │
│                               │ input                               │
│  Layer 1: RAW RECORDING ──────┼──────────────────────────────────   │
│  ┌────────────────────────────┴────────────────────────────────┐   │
│  │  Agent records immediately upon failure detection           │   │
│  │  (real-time)                                                │   │
│  │                                                             │   │
│  │  memory/failures/                                           │   │
│  │  ├── 2026-03-24.md  ← Raw records by date                  │   │
│  │  ├── 2026-03-25.md                                          │   │
│  │  └── 2026-03-26.md                                          │   │
│  └────────────────────────────┬────────────────────────────────┘   │
│                               │ trigger                             │
│  Layer 0: EVENT ──────────────┴──────────────────────────────────   │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │  Failure event occurs                                       │   │
│  │                                                             │   │
│  │  ┌─────────┐  ┌────────────┐  ┌───────────────┐  ┌──────┐ │   │
│  │  │  ERROR   │  │ CORRECTION │  │RETRY_EXCEEDED │  │MISUND│ │   │
│  │  │ exec err │  │ user fix   │  │ retry limit   │  │misund│ │   │
│  │  └─────────┘  └────────────┘  └───────────────┘  └──────┘ │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

5-Layer Pipeline

Layer 0: Event Occurrence

Failure events naturally occur during the agent's normal workflow.

Detection Criteria:

Event Detection Method Example
Tool execution failure exit code ≠ 0, error message npm install failure, API 4xx/5xx
User correction "No", "redo", "that's not it", etc. "Not that file, the one under src/"
Retry exceeded Same task attempted 3+ times Tried the same selector 3 times
Misunderstanding Output doesn't match request Generated full translation when asked for "summarize"

What the agent should do: Record immediately to Layer 1 when the above events are detected. This behavior should be specified as a rule in AGENTS.md/CLAUDE.md.

Layer 1: Raw Recording (Immediate, Real-time)

Record in memory/failures/YYYY-MM-DD.md immediately upon failure detection.

Key Principles:

  • Record immediately after detection (not batched)
  • Agent records directly (no script needed)
  • One file per day, accumulated chronologically
  • Structured format (see Recording Format below)

Directory Structure:

memory/failures/
├── 2026-03-24.md
├── 2026-03-25.md
└── 2026-03-26.md

Layer 2: Structured Analysis (Batch)

sync-learnings.py parses failures/ and generates structured analysis results in .learnings/.

Execution timing: Cron (daily-reflection) or manual execution

Input: memory/failures/*.md Output:

.learnings/
├── summary.json            ← Overall statistics (machine-readable)
├── repeated-patterns.md    ← Repeated pattern analysis (human-readable)
├── promotable.json         ← Promotion candidate list (auto-promote input)
└── by-type/
    ├── error.md
    ├── correction.md
    ├── retry_exceeded.md
    └── misunderstand.md

Processing Steps:

  1. Parse all .md files in failures/ in date order
  2. Extract type, cause, and lesson from each entry
  3. Normalize cause text to generate pattern keys (MD5 hash)
  4. Group by identical pattern keys
  5. Patterns with 3+ repetitions → registered as promotion candidates in promotable.json

Layer 3: Pattern Detection + Rule Promotion (Automatic)

auto-promote.py reads promotable.json and automatically inserts rules into the target file.

Promotion condition: Same pattern repeated 3+ times (configurable)

Target files (configurable):

  • AGENTS.md — OpenClaw, general-purpose
  • CLAUDE.md — Claude Code
  • .cursorrules — Cursor IDE
  • Custom file — --target option

Deduplication: Previously promoted pattern keys are recorded in .learnings/promoted.json to prevent duplicate insertion

Layer 4: Execution Guardrail (Pre-task Lookup)

Query similar failures before starting a new task to provide advance warnings.

Implementation method (agent rule):

## Pre-task Check
When receiving a new task → run python3 scripts/failure-matcher.py "\x3Ctask keyword>"
→ If similar failure records exist, reference lessons before starting the task

failure-matcher.py behavior:

  1. Load .learnings/summary.json
  2. Compare task keywords against past failure titles/causes (simple keyword matching)
  3. Output failure records with high similarity
  4. Agent reads the output and references the lessons

Note: failure-matcher.py is not provided separately. A simple implementation example is shown in Quick Start below. Using grep on sync-learnings.py output files is a sufficient alternative.


Failure Type Classification

Four failure types are defined. All failures are classified as one of these.

ERROR — Execution Error

Tool/command/API execution failed.

Field Description
Code ERROR
Detection exit code ≠ 0, error message, exception thrown
Example npm install failure, API 404, file not found, permission error
Frequency Most common

CORRECTION — User Correction

User corrected the agent's output.

Field Description
Code CORRECTION
Detection "No", "redo", "that's not it", "do it like this", correction instructions
Example "Not that file, the one under src/", "The format is wrong"
Importance Highest value — reflects user's implicit preferences

RETRY_EXCEEDED — Retry Limit Exceeded

Same task attempted 3+ times.

Field Description
Code RETRY_EXCEEDED
Detection Same/similar command executed 3+ times
Example Tried same CSS selector 3 times, called same API endpoint 3 times
Meaning Pattern of blindly retrying without addressing the root cause

MISUNDERSTAND — Instruction Misunderstanding

Generated output that doesn't match user's intended instruction.

Field Description
Code MISUNDERSTAND
Detection Mismatch between output and request, "that's not what I meant"
Example Generated "translation" when asked for "summary", confused target file
Root Cause Ambiguity in instructions or lack of context

Recording Format

Raw Record (memory/failures/YYYY-MM-DD.md)

## HH:MM - [TYPE_CODE] Brief Title

- **Type:** ERROR | CORRECTION | RETRY_EXCEEDED | MISUNDERSTAND
- **Situation:** What was being attempted
- **Cause:** Why it failed
- **Lesson:** How to handle it next time
- **Cumulative:** Nth occurrence (across all files of the same type)

Example: Actual Records

# 2026-03-25 Failure Records

## 09:15 - [ERROR] Playwright selector failure

- **Type:** ERROR
- **Situation:** During KNOU site login automation, attempted click with #loginBtn selector
- **Cause:** Site redesign changed DOM structure. Selector changed to #login-button
- **Lesson:** Must verify selector existence with DOM dump before use. No guessing selectors
- **Cumulative:** 2nd occurrence

## 11:30 - [CORRECTION] File path error

- **Type:** CORRECTION
- **Situation:** User instructed "modify the file under src/"
- **Cause:** Guessed path as ./lib/src/ instead of ./src/
- **Lesson:** No path guessing. Verify with ls/find before use
- **Cumulative:** 3rd occurrence

## 14:00 - [RETRY_EXCEEDED] Repeated API authentication failure

- **Type:** RETRY_EXCEEDED
- **Situation:** Repeated 401 error on GitHub API calls
- **Cause:** Retried 5 times with the same token without checking expiration
- **Lesson:** On auth error, immediately verify token validity. No blind retrying
- **Cumulative:** 1st occurrence

## 16:45 - [MISUNDERSTAND] Translation instead of summary

- **Type:** MISUNDERSTAND
- **Situation:** User instructed "summarize this document"
- **Cause:** Misinterpreted as translation because document was in English
- **Lesson:** "summarize" ≠ "translate". Distinguish instruction verbs precisely
- **Cumulative:** 1st occurrence

Structured Analysis Output (.learnings/promotable.json)

[
  {
    "pattern_key": "ERROR:a1b2c3d4",
    "type": "ERROR",
    "count": 3,
    "title": "Playwright selector failure",
    "cause": "Site redesign changed DOM structure. Selector changed",
    "lesson": "Must verify selector existence with DOM dump before use",
    "first_seen": "2026-03-23",
    "last_seen": "2026-03-25",
    "suggested_rule": "Must verify selector existence with DOM dump before use"
  }
]

Promotion Conditions and Logic

Promotion Conditions

Condition Value Configurable
Minimum repeat count 3 (default) --min-count or AFL_MIN_COUNT env var
Same pattern determination Type + cause text MD5 hash Automatic
Deduplication Recorded in .learnings/promoted.json Automatic
Target file AGENTS.md (default) --target or AFL_TARGET_FILE env var

Promotion Process

1. Run sync-learnings.py
   └→ Parse failures/*.md
   └→ Pattern grouping
   └→ 3+ patterns → promotable.json

2. Run auto-promote.py
   └→ Load promotable.json
   └→ Compare with promoted.json (exclude already promoted)
   └→ Format new rules
   └→ Insert into target file
   └→ Update promoted.json

Promotion Format (by Target)

AGENTS.md (agents-md):

| 2026-03-25 | Must verify selector existence with DOM dump before use | [ERROR] 3x repeat — Playwright selector failure |

CLAUDE.md (claude-md):

- **ERROR**: Must verify selector existence with DOM dump before use (3x repeat, cause: DOM structure change)

.cursorrules (cursorrules):

- Must verify selector existence with DOM dump before use

Generic (plain):

### [ERROR] Playwright selector failure
- **Rule:** Must verify selector existence with DOM dump before use
- **Count:** 3x
- **Promoted:** 2026-03-25

Installation

Zero-Config Installation (30 Seconds)

# 1. Copy to skills directory
cp -r agent-failure-loop/ ~/.agents/skills/agent-failure-loop/

# 2. Create failures directory
mkdir -p memory/failures

# 3. Done. Scripts use only Python 3.8+ standard library.

Add Agent Rules

Add the following rules to AGENTS.md (or CLAUDE.md, .cursorrules):

## 🚨 Failure Detection + Auto-Recording Protocol

### Failure Type Definitions
| Type | Code | Detection Criteria |
|------|------|--------------------|
| Execution error | ERROR | Tool execution failure, API error, command error |
| User correction | CORRECTION | User corrects with "no", "redo", etc. |
| Retry exceeded | RETRY_EXCEEDED | Same task retried 3+ times |
| Misunderstanding | MISUNDERSTAND | Output doesn't match instruction intent |

### Immediate Action on Detection (Mandatory — Do Not Skip)
1. Record in memory/failures/YYYY-MM-DD.md with the following format:
   ## HH:MM - [TYPE_CODE] Brief Title
   - **Type:** ERROR | CORRECTION | RETRY_EXCEEDED | MISUNDERSTAND
   - **Situation:** What was being attempted
   - **Cause:** Why it failed
   - **Lesson:** How to handle it next time
   - **Cumulative:** Nth occurrence
2. Check cumulative count of same type (search all failures/ files)
3. 3+ repeats → Immediately add rule to AGENTS.md self-improvement rules table

### Pre-task Check
When receiving a new task → Query past similar failures → Reference lessons before starting

Environment Variables (Optional)

Variable Default Description
AFL_FAILURES_DIR memory/failures Failure records directory
AFL_LEARNINGS_DIR .learnings Analysis results directory
AFL_TARGET_FILE AGENTS.md Rule promotion target file
AFL_FORMAT agents-md Promotion format
AFL_MIN_COUNT 3 Minimum repeat count

Quick Start (5 Minutes)

Step 1: Install (30 Seconds)

# Copy skill + create directory
cp -r agent-failure-loop/ ~/.agents/skills/agent-failure-loop/
mkdir -p memory/failures

Step 2: Generate Test Data (1 Minute)

cat > memory/failures/2026-03-24.md \x3C\x3C 'EOF'
## 10:00 - [ERROR] Guessed selector failure

- **Type:** ERROR
- **Situation:** Attempted click with #loginBtn during web page automation
- **Cause:** Guessed selector without checking DOM
- **Lesson:** Must verify with DOM dump before using any selector
- **Cumulative:** 1st occurrence
EOF

cat > memory/failures/2026-03-25.md \x3C\x3C 'EOF'
## 09:00 - [ERROR] Guessed selector failure

- **Type:** ERROR
- **Situation:** Attempted click with .submit-btn on a different page
- **Cause:** Guessed selector without checking DOM
- **Lesson:** Must verify with DOM dump before using any selector
- **Cumulative:** 2nd occurrence

## 14:00 - [CORRECTION] Wrong file path

- **Type:** CORRECTION
- **Situation:** Instructed to modify file under src/
- **Cause:** Guessed path without running ls
- **Lesson:** No path guessing, verify with ls/find
- **Cumulative:** 1st occurrence
EOF

cat > memory/failures/2026-03-26.md \x3C\x3C 'EOF'
## 11:00 - [ERROR] Guessed selector failure

- **Type:** ERROR
- **Situation:** Attempted click with #btn-submit on yet another page
- **Cause:** Guessed selector without checking DOM
- **Lesson:** Must verify with DOM dump before using any selector
- **Cumulative:** 3rd occurrence
EOF

Step 3: Run Analysis (30 Seconds)

python3 scripts/sync-learnings.py --failures-dir memory/failures --learnings-dir .learnings

Expected Output:

[sync-learnings] Scanning: memory/failures
[sync-learnings] Found 4 failure entries
[OK] .learnings/summary.json
[OK] .learnings/repeated-patterns.md
[OK] .learnings/by-type/error.md
[OK] .learnings/by-type/correction.md
[OK] .learnings/promotable.json (1 candidates)

[sync-learnings] Done. 1 repeated patterns found.
[sync-learnings] Run auto-promote.py to promote rules automatically.

Step 4: Auto-Promote (30 Seconds)

# First, preview with dry-run
python3 scripts/auto-promote.py --learnings-dir .learnings --target AGENTS.md --dry-run

# Actual promotion
python3 scripts/auto-promote.py --learnings-dir .learnings --target AGENTS.md

Expected Output:

[auto-promote] 1 new rules to promote
[OK] Updated: AGENTS.md
[auto-promote] Promoted 1 rules to AGENTS.md

--- Promoted Rules ---
  [ERROR] Guessed selector failure (3x) → Must verify with DOM dump before using any selector

Step 5: Verify (30 Seconds)

# Check if rule was added to AGENTS.md
grep "selector" AGENTS.md

# Check promotion records
cat .learnings/promoted.json

Done in 5 minutes! Now when the agent makes the same mistake 3 times, a rule is automatically created.


Cron Integration

daily-reflection Cron Example

Runs automatically at 23:00 daily to analyze the day's failures and promote rules.

OpenClaw Cron Configuration:

name: daily-reflection
schedule: "0 23 * * *"
message: |
  Time for daily reflection.
  1. Run python3 scripts/sync-learnings.py
  2. Run python3 scripts/auto-promote.py
  3. Summarize today's failure patterns
  4. Report any newly promoted rules

Standard crontab Configuration:

# Run daily at 23:00
0 23 * * * cd /path/to/workspace && python3 scripts/sync-learnings.py && python3 scripts/auto-promote.py >> /tmp/failure-loop.log 2>&1

weekly-skill-review Integration

Register repeated tasks as skill candidates during weekly review:

name: weekly-skill-review
schedule: "0 10 * * 0"  # Every Sunday at 10:00
message: |
  Weekly skill review:
  1. Check .learnings/repeated-patterns.md
  2. Identify repeated patterns that could be extracted as skills
  3. Register candidates in memory/skill-review/candidates.md

Real-time + Batch Hybrid

Real-time (agent directly):

  • Layer 0 → Layer 1: Record in failures/ immediately upon failure detection
  • When same type reaches 3 occurrences, immediately add rule to AGENTS.md

Batch (cron):

  • Layer 1 → Layer 2: Structured analysis via sync-learnings.py
  • Layer 2 → Layer 3: Handle missed promotions via auto-promote.py
  • Catches patterns missed by the agent's real-time detection as a second safety net

Before/After Demo

Before: Without Rules

Day 1:
  User: "Click the login button on this page"
  Agent: Tries clicking #loginBtn → fails (selector doesn't exist)
  Agent: Tries .login-btn → fails
  Agent: Tries button[type=submit] → succeeds
  → 30 minutes wasted

Day 2:
  User: "Click the search button on that page"
  Agent: Tries clicking #searchBtn → fails
  Agent: Tries .search-button → fails
  → Another 30 minutes wasted (same pattern!)

Day 3:
  User: "Click the signup button"
  Agent: Tries clicking #signupBtn → fails
  → Repeating again and again...

Problem: Same mistake every time. No learning. User frustration ↑

After: With agent-failure-loop Applied

Day 1:
  Agent: Clicks #loginBtn → fails
  → [Auto-recorded] ERROR recorded in failures/2026-03-24.md
  Agent: Checks DOM dump, succeeds with correct selector

Day 2:
  Agent: Clicks #searchBtn → fails
  → [Auto-recorded] Cumulative 2nd occurrence
  Agent: Checks DOM dump, succeeds

Day 3:
  Agent: Clicks #signupBtn → fails
  → [Auto-recorded] Cumulative 3rd occurrence!
  → [Auto-promoted] Rule added to AGENTS.md:
     "Must verify selector existence with DOM dump before use. No guessing selectors."

Day 4:
  User: "Click the payment button"
  Agent: (References AGENTS.md rule)
  → Runs DOM dump first
  → Confirms correct selector
  → Succeeds on first try! ✅

Result: No recurrence of the same mistake from Day 4 onward. Auto-learning complete.

Actual Auto-Promotion Simulation

# 1. Generate 3 days of failure data (Step 2 from Quick Start above)

# 2. Run sync-learnings.py
$ python3 scripts/sync-learnings.py
[sync-learnings] Found 4 failure entries
[OK] .learnings/promotable.json (1 candidates)

# 3. auto-promote.py --dry-run
$ python3 scripts/auto-promote.py --dry-run
[auto-promote] 1 new rules to promote
[DRY-RUN] Would update: AGENTS.md
[DRY-RUN] Inserting at position 2847:
| 2026-03-26 | Must verify with DOM dump before using any selector | [ERROR] 3x repeat — Guessed selector failure |

# 4. Actual promotion
$ python3 scripts/auto-promote.py
[auto-promote] Promoted 1 rules to AGENTS.md
--- Promoted Rules ---
  [ERROR] Guessed selector failure (3x) → Must verify with DOM dump before using any selector

# 5. Verify AGENTS.md
$ grep -A1 "selector" AGENTS.md
| 2026-03-26 | Must verify with DOM dump before using any selector | [ERROR] 3x repeat — Guessed selector failure |

Comparison with Competing Skills

Feature agent-failure-loop self-improving-agent actual-self-improvement
Auto failure detection ✅ 4-type classification ❌ Conversation patterns only ⚠️ Manual trigger
Immediate recording ✅ Layer 1 real-time ❌ After session ends ❌ Manual
Structured analysis ✅ sync-learnings.py ⚠️ Python available
Auto promotion ✅ 3x repeat → automatic ❌ Manual rule addition
Cron integration ✅ daily-reflection
Guardrails ✅ Pre-task lookup
Multi-platform ✅ OpenClaw/Claude/Codex/Cursor ⚠️ ChatGPT-centric ⚠️ Python only
Target file config ✅ AGENTS/CLAUDE/cursorrules/custom ❌ Fixed
Deduplication ✅ promoted.json
Zero-config ✅ Python 3.8+ stdlib only ⚠️ npm required ⚠️ pip required
Enforcement ✅ Rule promotion = agent behavior change ❌ Suggestions only
Skill extraction integration ✅ Repeated patterns → skill candidates

Why agent-failure-loop?

  1. End-to-end: Covers the entire pipeline from detection to promotion
  2. Enforcement: Promoted rules go into AGENTS.md/CLAUDE.md, which the agent must read
  3. Automation: Combined with cron, the self-improvement loop runs without human intervention
  4. Cross-platform: Not tied to any specific platform
  5. Transparency: All failures and promotion processes are preserved as markdown files for auditing

Cross-Platform Configuration

Platform-Specific Configuration Examples

OpenClaw:

export AFL_TARGET_FILE="AGENTS.md"
export AFL_FORMAT="agents-md"

Claude Code:

export AFL_TARGET_FILE="CLAUDE.md"
export AFL_FORMAT="claude-md"

Cursor IDE:

export AFL_TARGET_FILE=".cursorrules"
export AFL_FORMAT="cursorrules"

Codex / Others:

export AFL_TARGET_FILE="rules.md"
export AFL_FORMAT="plain"

Custom Configuration File

Place .failure-loop.json at the project root to manage configuration via file instead of environment variables:

{
  "failures_dir": "memory/failures",
  "learnings_dir": ".learnings",
  "target_file": "AGENTS.md",
  "format": "agents-md",
  "min_count": 3
}

Note: The current version of scripts supports environment variables and CLI arguments. .failure-loop.json support is planned for a future version.

AGENTS.md Format Independence

This skill does not depend on a specific AGENTS.md format:

  • --format agents-md: Adds rows to the "self-improvement rules" table in AGENTS.md. If the table doesn't exist, appends to end of file.
  • --format plain: Can append to any markdown file
  • --target: Can specify any file

Script Reference

sync-learnings.py

Parses raw failure records from the failures/ directory and generates structured analysis results in .learnings/.

Usage:

python3 scripts/sync-learnings.py [options]

Options:

Option Default Description
--failures-dir memory/failures Failure records directory
--learnings-dir .learnings Analysis results output directory
--dry-run - Preview without writing files
--json - Output summary in JSON format

Output Files:

  • summary.json — Overall statistics
  • repeated-patterns.md — Repeated pattern analysis
  • promotable.json — Promotion candidate list
  • by-type/*.md — Details by type

Dependencies: Python 3.8+ standard library only (hashlib, json, re, pathlib, etc.)

auto-promote.py

Reads promotion candidates from .learnings/promotable.json and automatically inserts rules into the target file.

Usage:

python3 scripts/auto-promote.py [options]

Options:

Option Default Description
--learnings-dir .learnings Analysis results directory
--target AGENTS.md Promotion target file
--format agents-md Output format (agents-md/claude-md/cursorrules/plain)
--min-count 3 Minimum repeat count
--dry-run - Preview without modifying files
--force - Re-promote already promoted patterns

Dependencies: Python 3.8+ standard library only


FAQ

Q: What if the agent doesn't record failures?

You need to add failure recording rules to AGENTS.md/CLAUDE.md. See "Add Agent Rules" in the Installation section. With the rules in place, the agent will automatically record upon failure detection. If the agent ignores the rules... that itself will be recorded as a CORRECTION.

Q: Won't the same rule be promoted twice?

.learnings/promoted.json records already-promoted pattern keys to prevent duplication. Forced re-promotion is possible with the --force option.

Q: If cause text is slightly different, will it be recognized as a different pattern?

The current version distinguishes patterns using MD5 hash of the cause text. Whitespace and case are normalized, but semantically identical causes with different wording will be recognized as separate patterns. It's recommended to specify in the rules that the agent should record causes with consistent wording.

Future improvement: Semantic similarity (embedding comparison) support planned.

Q: Can it be used in environments without Python?

Even without scripts, the agent can directly perform Layer 1 (recording) and Layer 3 (promotion). Scripts serve as a double safety net for batch analysis (Layer 2) and auto-promotion. The basic loop works with the agent's real-time detection alone.

Q: What if failure records accumulate too much?

Since sync-learnings.py generates summaries after analysis, raw records can be archived:

# Archive records older than 30 days
mkdir -p memory/failures/archive
find memory/failures/ -name "*.md" -mtime +30 -exec mv {} memory/failures/archive/ \;

Q: Can it be shared across a team?

Committing the .learnings/ directory to git allows the entire team to share learning results. promoted.json prevents duplicate promotions, so it's safe for multiple people to use simultaneously.

Q: How do I use it with Claude Code without OpenClaw?

  1. Add failure recording rules to CLAUDE.md (see Installation)
  2. Use --target CLAUDE.md --format claude-md when running scripts
  3. Use manual execution or OS crontab instead of cron

Q: Is it compatible with existing AGENTS.md self-improvement rules?

Fully compatible. auto-promote.py finds the "self-improvement rules" table in AGENTS.md and adds rows. If the table doesn't exist, it appends to the end of the file.

Q: Can I write failure records directly to .learnings/?

No. failures/ is raw data, .learnings/ is analysis results. The agent records only in failures/, and .learnings/ is auto-generated by sync-learnings.py. This separation ensures data integrity.

Q: What if a promoted rule is wrong?

Manually delete the rule from AGENTS.md. The pattern key remains in promoted.json, so the same rule won't be promoted again. To re-promote, use the --force option or delete the key from promoted.json.


Full Directory Structure

workspace/
├── memory/
│   └── failures/              ← Layer 1: Raw records (agent records directly)
│       ├── 2026-03-24.md
│       ├── 2026-03-25.md
│       └── archive/           ← Archive for old records
│
├── .learnings/                ← Layer 2: Structured analysis (sync-learnings.py output)
│   ├── summary.json
│   ├── repeated-patterns.md
│   ├── promotable.json        ← Layer 3 input
│   ├── promoted.json          ← Promotion completion records
│   └── by-type/
│       ├── error.md
│       ├── correction.md
│       ├── retry_exceeded.md
│       └── misunderstand.md
│
├── AGENTS.md                  ← Layer 3: Promotion target (auto-promote.py inserts rules)
│   └── Self-improvement rules table
│
└── scripts/                   ← Or scripts/ within the skill directory
    ├── sync-learnings.py
    └── auto-promote.py

Production Usage Evidence

This skill has been validated in a real production environment. A significant number of the 20+ rules in AGENTS.md's "self-improvement rules" table were auto-generated through this pipeline:

  • Must check environment before writing scripts — Auto-promoted from [ERROR] 3 consecutive failures
  • On re-working same site/tool, must read memory_search + previous success records — Auto-promoted from [RETRY_EXCEEDED] 6 attempts
  • No guessing selectors/paths, must verify before use — Auto-promoted from [ERROR] multiple repeats

After these rules were promoted, the recurrence rate of the same failure types decreased significantly.


License

MIT License. Free to use, modify, and distribute.


agent-failure-loop v1.0.0 — Building agents that learn from failure.

Usage Guidance
This skill appears internally consistent: it reads local failure logs, analyzes them, and can write promoted rules back into local rule files. Before running or installing: 1) backup AGENTS.md/CLAUDE.md (or run in a branch) so automatic inserts can be reviewed; 2) run the scripts with --dry-run first to inspect what would be changed; 3) verify the failures/ or memory/failures directory contains only the entries you expect (the tool groups by a hash of the 'cause'); 4) if you intend to run automatically (cron), ensure you trust the promotion logic and review promotable.json periodically. No network exfiltration or credential access was found in the provided files.
Capability Analysis
Type: OpenClaw Skill Name: agent-failure-loop Version: 1.0.1 The 'agent-failure-loop' skill is a self-improvement framework designed to help AI agents learn from repeated mistakes by logging failures and automatically promoting them to instruction files like AGENTS.md or CLAUDE.md. The provided Python scripts (sync-learnings.py and auto-promote.py) use standard libraries to parse markdown logs and update rule files based on a repetition threshold (defaulting to 3 occurrences). While the automated modification of an agent's core instructions is a high-privilege operation, the implementation is transparently documented, includes safety features like dry-run modes and deduplication via promoted.json, and lacks any indicators of malicious intent such as data exfiltration or unauthorized network access.
Capability Assessment
Purpose & Capability
The name/description (auto-detect, classify, track, promote failures) aligns with the provided artifacts: two Python scripts and a SKILL.md describing reading failure records, analyzing patterns, and promoting rules into AGENTS.md/CLAUDE.md/.cursorrules. There are no unexpected external dependencies or credentials required.
Instruction Scope
SKILL.md and the scripts instruct the agent to read local failure records (default: memory/failures or failures directory), write structured analysis to .learnings/, and insert rules into target files (default AGENTS.md). This is consistent with the purpose, but it does grant the skill the ability to modify repository/local documentation files (AGENTS.md, CLAUDE.md, etc.). The scripts use only local file I/O; no network endpoints are present.
Install Mechanism
There is no install spec and the Python scripts use only the standard library. No external downloads, package installs, or extracts are performed by the skill bundle itself.
Credentials
The skill declares no required environment variables or credentials. The scripts optionally read environment overrides (AFL_LEARNINGS_DIR, AFL_TARGET_FILE, AFL_FORMAT, AFL_MIN_COUNT) which are reasonable convenience hooks and do not grant extra privileges.
Persistence & Privilege
The skill is not always-included and does not request elevated agent privileges, but it does modify local files (creates/updates .learnings/*, promotable.json, and can insert lines into AGENTS.md/CLAUDE.md or create new files in plain mode). This is expected for its purpose but you should be aware it can autonomously change local rule files when run (there is a --dry-run option).
How to Use
  1. Make sure OpenClaw is installed (local or Docker)
  2. Run the install command in chat: /install agent-failure-loop
  3. After installation, invoke the skill by name or use /agent-failure-loop
  4. Provide required inputs per the skill's parameter spec and get structured output
Version History
v1.0.1
Translated all content to English
v1.0.0
Initial release of agent-failure-loop: end-to-end self-improving loop to detect, accumulate, and auto-promote repeated agent failures into guardrail rules. - Automatically detects, classifies, and accumulates agent failures across sessions. - Promotes a guardrail rule after the same mistake happens three times, auto-inserting into AGENTS.md/CLAUDE.md. - Provides a batch analysis pipeline (layered architecture: raw recording, structured analysis, pattern detection, rule promotion, guardrail enforcement). - Supports multiple platforms (openclaw, claude-code, codex, cursor, any-agent). - Includes sample pipelines, installation, quick start, and practical FAQ sections. - MIT licensed.
Metadata
Slug agent-failure-loop
Version 1.0.1
License MIT-0
All-time Installs 0
Active Installs 0
Total Versions 2
Frequently Asked Questions

What is Agent Failure Loop?

An end-to-end self-improvement loop that automatically detects agent failures, classifies them, tracks recurrence, auto-generates rules, and promotes them to... It is an AI Agent Skill for Claude Code / OpenClaw, with 140 downloads so far.

How do I install Agent Failure Loop?

Run "/install agent-failure-loop" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Agent Failure Loop free?

Yes, Agent Failure Loop is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Agent Failure Loop support?

Agent Failure Loop is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Agent Failure Loop?

It is built and maintained by reikys (@reikys); the current version is v1.0.1.

💬 Comments