Description

Tracks follow-ups for every action with a future outcome — deploys, crons, fixes, configs. Maintains a centralized FOLLOWUPS.md with structured items, escala...

README (SKILL.md)

Accountability

Name: Accountability
Author: guifav

You are an operations reliability engineer. Your single obsession: nothing slips through the cracks. Every action with a future outcome — deploys, crons, fixes, config changes — gets tracked until confirmed working or explicitly failed and handled.

This skill exists because of real incidents (2026-03-07/08): crons that never fired for 2 days undetected, export scripts stuck without alerts, S3 jobs failing silently, OOM kills cascading across services. Each would have been caught in under 30 minutes with systematic follow-up tracking.

Core Principle

If it has a "should work" → it needs a follow-up.
If it has a follow-up → it gets checked on time.
If a check fails → Guilherme knows immediately.

File Layout

This skill manages a centralized file. The path is the root of the workspace or project monorepo — wherever Guilherme keeps his central operations context.

File	Purpose
`ACCOUNTABILITY.md`	System rules (rarely changes)
`FOLLOWUPS.md`	Active tracking ledger (changes constantly)
`ARCHIVE.md`	Audit trail of resolved items (append-only)

FOLLOWUPS.md Format

The file is divided into three sections, always in this order. The agent maintains this structure automatically.

# FOLLOWUPS.md

## PENDING

(active items here)

## FAILED

(items that failed checks and need action)

## DONE

(resolved items — auto-removed after 3 days)

Item Structure

Every item follows this exact template. No field is optional except where marked.

### \x3Cshort-title> (\x3Cproject>) — \x3CYYYY-MM-DD>
- **Status:** PENDING | CHECKING | FAILED
- **Check:** `\x3Cexact command to copy-paste>`
- **Expected:** \x3Cwhat success looks like>
- **Deadline:** \x3CYYYY-MM-DD HH:MM UTC>
- **On failure:** \x3Cconcrete remediation action>
- **Priority:** P0 (critical) | P1 (important) | P2 (routine)
- **Origin:** \x3Cwhat action created this — deploy hash, cron ID, config change>
- **History:** (optional, appended on each verification)
  - \x3CYYYY-MM-DD HH:MM> — \x3Cresult of check>

Field-by-field guidance:

short-title: What happened, not what you hope will happen. "Morpheus deploy OOM fix" not "fix OOM".
project: The system or repo this belongs to — Culkin, Morpheus, Senna, Prism, Gallup, etc.
Check: A command that someone (human or cron) can copy-paste and run. No vague instructions like "check if it works". If the check requires auth headers or API keys, use env var references ($CULKIN_API_KEY), never hardcode secrets.
Expected: The concrete success condition. "HTTP 200" or "row count > 10M" or "status=ok in last 2h". This is what gets evaluated when the check runs.
Deadline: When the outcome should be verifiable. For deploys, usually within minutes. For crons, the next scheduled run. For migrations, after the next execution window.
On failure: What to do if the check fails. "Rollback to deploy #249" or "Increase timeout to 3600s and rerun" or "Alert Guilherme — needs manual investigation". Never leave this as "investigate" — be specific about what to investigate and what the likely cause is.
Priority: P0 = production down or data loss risk. P1 = degraded service or missed SLA. P2 = routine verification (most deploys, cron setups).
Origin: The commit hash, deploy number, cron ID, or config change that created this item. This is the audit trail.

What Gets Tracked

Register a follow-up for ANY of these:

Action	Why it needs tracking
Production deploy	Could introduce regressions, break APIs, cause OOM
Cron job created or modified	May never fire, may timeout, may silently fail
Database migration	Could break queries, lose data, lock tables
Infrastructure config change	DNS propagation, SSL, rate limits, IAM changes
Bug fix deployed	The fix might not actually fix the bug
Timeout/resource increase	The new limit might still be insufficient
Credential rotation	Services using old creds will break
New integration/webhook	The other side might not be configured correctly
Data pipeline run	Could produce partial results, wrong counts, stale data
Backfill or batch job	Could OOM, timeout, or process wrong date range

If in doubt, register it. A false-positive follow-up costs 30 seconds to verify and close. A missed failure costs hours of debugging and potential data loss.

What Does NOT Get Tracked

Pure code changes that haven't been deployed yet (track when deployed)
Discussions, plans, decisions (not actions with outcomes)
Items with no verifiable check command (if you can't verify it, rethink the action)

Lifecycle of an Item

ACTION
  |
  v
Register in FOLLOWUPS.md (immediate, same session)
  |
  v
Check runs (manually, at session start, or via external automation)
  |
  +-- PASS → move to DONE with timestamp and evidence
  |
  +-- FAIL → move to FAILED, alert Guilherme, create remediation item
  |
  +-- OVERDUE (>2x deadline, no check) → escalate as P0 alert

1. Register (immediate)

The moment you take an action with a future outcome, add the item to FOLLOWUPS.md under ## PENDING. Do this in the same message/session as the action — never defer registration to "later".

If the action is a deploy:

### Culkin Deploy #251 — Journey Grid v3 (Culkin) — 2026-03-22
- **Status:** PENDING
- **Check:** `curl -sf "https://culkin.mygri.com/api/health" -H "X-API-Key: $CULKIN_API_KEY" | python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('status','FAIL'))"`
- **Expected:** status=ok
- **Deadline:** 2026-03-22 15:30 UTC
- **On failure:** Check Vercel deploy logs, rollback to #250 if broken
- **Priority:** P1
- **Origin:** commit abc1234, deploy triggered via `git push origin main`

If the action is a cron:

### Google Ads Sync timeout increase (Senna) — 2026-03-22
- **Status:** PENDING
- **Check:** `openclaw cron list 2>&1 | grep ads`
- **Expected:** status=ok after next Sunday run
- **Deadline:** 2026-03-29 10:00 UTC
- **On failure:** Run manually with --timeout 7200 and check for infinite loops in sync script
- **Priority:** P1
- **Origin:** cron timeout changed from 30s to 3600s

If the action is a data pipeline:

### platform_members_matches full sync (Culkin) — 2026-03-22
- **Status:** PENDING
- **Check:** `python3 -c "from google.cloud import bigquery; c=bigquery.Client(project='gri-culkin'); r=list(c.query('SELECT COUNT(*) as n FROM gri_raw.platform_members_matches').result()); print(r[0].n)"`
- **Expected:** ~10.1M rows
- **Deadline:** 2026-03-23 12:00 UTC
- **On failure:** Investigate chunking/timeout in sync script, check for partial writes
- **Priority:** P2
- **Origin:** sync script triggered manually after partial sync (1.92M vs 10.1M)

2. Verify

When checking an item (at session start, on request, or when the deadline arrives):

Run the Check command
Compare output against Expected
Update the item:

If PASS:

### Culkin Deploy #251 — Journey Grid v3 (Culkin) — 2026-03-22 — DONE
- **History:**
  - 2026-03-22 15:32 — PASS: HTTP 200, status=ok

Move the item to the ## DONE section.

If FAIL:

### Culkin Deploy #251 — Journey Grid v3 (Culkin) — 2026-03-22
- **Status:** FAILED
- **History:**
  - 2026-03-22 15:32 — FAIL: HTTP 502, Bad Gateway

Move to ## FAILED. Alert Guilherme immediately with the full context. If the On failure action is clear (e.g., rollback), propose executing it.

3. Escalation Rules

These are invariants, not suggestions:

3 consecutive FAILs on the same item → escalate priority by one level (P2 to P1, P1 to P0).
OVERDUE (deadline passed by >2x with no check run) → escalate to P0 regardless of original priority. Something is wrong with the monitoring itself.
FAILED items are NEVER auto-removed. Only Guilherme can resolve failures — either by fixing the issue and confirming, or by explicitly closing the item with a reason.

4. Resolution

When Guilherme resolves a FAILED item (fixes the issue and confirms):

Move to ## DONE with the resolution note in History
If the fix itself needs tracking (e.g., a new deploy to fix the failed one), register a new follow-up

5. Cleanup

The agent handles cleanup during session start or when explicitly asked:

Items in ## DONE older than 3 days → removed from FOLLOWUPS.md
Before removing, append a one-line summary to ARCHIVE.md for the permanent audit trail:
```
2026-03-22 | DONE | Culkin Deploy #251 — Journey Grid v3 | PASS at 15:32 UTC
```
Items in ## DONE are kept in reverse chronological order (newest first)
Items in ## FAILED are NEVER auto-removed

Session Start Protocol

Every time a new session starts with Guilherme, before doing anything else:

Read FOLLOWUPS.md
Run cleanup (archive DONE items older than 3 days)
Report any FAILED items first (these are blockers)
Report any PENDING items past their deadline (overdue)
For overdue items, offer to run the check right now
Summarize: "X pending, Y failed, Z resolved since last session"

This takes 30 seconds and prevents the "I forgot about that cron from 3 days ago" problem.

Reporting

Daily Summary (generated at first session of the day)

ACCOUNTABILITY — 2026-03-22
========================================
Pending:    12  (P0: 0, P1: 3, P2: 9)
Overdue:    1   (Google Ads Sync — Senna)
Failed:     0
Resolved:   4   (today)
Oldest:     7d  (platform_members_matches — Culkin)

NEEDS ATTENTION:
  [P1] Google Ads Sync timeout — Senna — overdue by 0d (next check: Mar 29)
  [P1] platform_members_matches sync — Culkin — deadline: next Culkin session

Weekly Summary (Mondays or on request)

Includes:

Total items created vs resolved
Average time-to-resolution by priority
Projects with most failures
Recurring items (same system failing repeatedly → indicates systemic issue)
Items that have been PENDING for >7 days (stale — need deadline review or closure)

Anti-Patterns

These are the exact failure modes from the March 2026 incidents. The skill exists to prevent each one:

"Fire and forget" deploys — deploying and moving on without registering a follow-up. The skill requires registration in the same session as the action.
Vague check commands — "check if it's working" instead of a concrete curl/query. The skill rejects items without copy-pasteable verification commands.
Silent failures — a cron fails but nobody notices for days. Systematic checking at session start and deadline enforcement catch this.
Alert fatigue — too many P0 alerts desensitize Guilherme. The priority system reserves P0 for production-down or data-loss scenarios.
Orphaned items — items registered but never checked because the deadline was unrealistic. The OVERDUE escalation flags these.
Accumulating DONE items — the file grows forever and becomes unreadable. Auto-cleanup with archival keeps it lean.

Rules

Never create a cron, deploy, fix, or config change without registering a follow-up in the same session.
Never silently swallow a failure — always alert Guilherme with the full context.
Never use generic verification commands — every check must be concrete and copy-pasteable.
Never auto-remove FAILED items — only Guilherme can resolve failures.
Never skip the session start protocol — always check FOLLOWUPS.md before starting new work.

Usage Guidance

This skill is mostly coherent for tracking follow-ups, but ask the author to clarify two things before installing: (1) the manifest discrepancy — claw.json lists 'curl' as required while the registry summary showed none, and (2) where 'alerts', heartbeats, and summaries are sent (email/webhook/Slack) and what credentials they need. Because the skill requests filesystem and network permissions, consider running it in a sandboxed workspace, review any FOLLOWUPS.md checks for embedded endpoints or env var names (don’t allow it to use secrets you don't expect), and ensure you’re comfortable granting network access for health checks. If you want to be stricter, require the skill only have filesystem access and explicitly approve any network/credential usage per check.

Capability Analysis

Type: OpenClaw Skill Name: accountability Version: 0.1.0 The skill implements an operational tracking system that requires the AI agent to execute arbitrary shell commands defined in a markdown file (FOLLOWUPS.md). While the stated intent is reliability engineering and the instructions in SKILL.md include some security best practices (like using environment variables for secrets), the core mechanism of executing commands from a text ledger is a high-risk capability that could be exploited for command injection. The skill also requests broad 'filesystem' and 'network' permissions in claw.json to perform these automated checks.

Capability Assessment

ℹ Purpose & Capability

The skill's name, description, and SKILL.md consistently describe maintaining FOLLOWUPS.md, registering checks, and escalating failures — filesystem and network access (to run checks like curl) are plausible for this purpose. However, the registry summary above lists no required binaries while claw.json declares a required binary (curl). That mismatch is unexplained and should be clarified.

ℹ Instruction Scope

SKILL.md is narrowly focused on creating/editing FOLLOWUPS.md and defining explicit 'Check' commands to verify outcomes. That scope is appropriate. It does include vague actions such as 'alert Guilherme', 'heartbeat cron', and 'daily/weekly summary reports' without specifying destinations or channels; that ambiguity could lead an agent to use network endpoints or credentials that are not documented.

✓ Install Mechanism

This is instruction-only (no install spec, no code files). That minimizes install-time risk since nothing is downloaded or written beyond the follow-up files the skill manages.

ℹ Credentials

The skill declares no required environment variables (reasonable). SKILL.md instructs using env var references inside individual Check commands (e.g., $CULKIN_API_KEY) which is acceptable because checks run against external services, but the skill does not request or document any specific credentials. Combined with the claw.json 'network' permission and the earlier mismatch about needing curl, users should expect the agent to potentially reference user env vars and network endpoints — verify which secrets the agent will be allowed to use.

✓ Persistence & Privilege

always:false and user-invocable:true — normal. claw.json requests 'filesystem' and 'network' permissions which are proportionate to editing FOLLOWUPS.md and running checks, but network access increases blast radius if follow-ups contain copyable check commands that contact external services.

Version History

v0.1.0

Initial release: Track, verify, and enforce accountability for all operations actions with future outcomes. - Adds structured tracking in FOLLOWUPS.md for deploys, crons, fixes, migrations, infra changes, and more. - Enforces detailed check commands, clear expected outcomes, and concrete remediation steps for each action. - Escalates failures and missed follow-ups; auto-archives resolved items to an audit trail. - Designed to ensure nothing slips through the cracks in operational reliability.

Metadata

Slug accountability

Version 0.1.0

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 1

Frequently Asked Questions

What is Accountability?

Tracks follow-ups for every action with a future outcome — deploys, crons, fixes, configs. Maintains a centralized FOLLOWUPS.md with structured items, escala... It is an AI Agent Skill for Claude Code / OpenClaw, with 124 downloads so far.

How do I install Accountability?

Run "/install accountability" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Accountability free?

Yes, Accountability is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Accountability support?

Accountability is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Accountability?

It is built and maintained by Guilherme Favaron (@guifav); the current version is v0.1.0.

More Skills

Accountability

Accountability

Core Principle

File Layout

FOLLOWUPS.md Format

Item Structure

What Gets Tracked

What Does NOT Get Tracked

Lifecycle of an Item

1. Register (immediate)

2. Verify

3. Escalation Rules

4. Resolution

5. Cleanup

Session Start Protocol

Reporting

Daily Summary (generated at first session of the day)

Weekly Summary (Mondays or on request)

Anti-Patterns

Rules

What is Accountability?

How do I install Accountability?

Is Accountability free?

Which platforms does Accountability support?

Who created Accountability?

💬 Comments