Cron Failure Runbook
/install cron-failure-runbook
Cron Failure Runbook
Use when a scheduled job, LaunchAgent, cron task, heartbeat step, or nightly automation fails, silently no-ops, produces incomplete output, or repeatedly generates dream-cycle failure proposals.
Goal
Turn unattended failures into reproducible evidence and one of three outcomes:
- Fixed and verified.
- Deferred with owner/date/reason.
- Escalated with the exact missing credential, approval, service, or runtime condition.
Procedure
-
Identify the scheduler context.
- Job name, plist/cron entry, command, cwd, shell, user, and expected environment.
- Last successful run and last failed/no-op run.
-
Reproduce in the same runtime lane.
- Run the exact command manually with the same env source where practical.
- Capture stdout, stderr, exit code, cwd, PATH, and relevant env variable presence without printing secret values.
- If the job depends on OpenClaw model calls, verify it uses gateway/Codex routing rather than raw OPENAI_API_KEY.
-
Run preflights before the expensive or external step.
- Auth: prove the running process can read the needed secret and make the smallest live API call.
- Files: prove input paths exist and output directories are writable.
- Network/service: prove target health endpoint or API is reachable.
- Approval: prove an external write has approval or a preapproved workflow flag.
-
Classify the failure.
- auth: missing/expired token, wrong vault, wrong runtime env, insufficient scope.
- runtime: wrong shell, PATH, Python/Node version, cwd, launchd env, permissions.
- input: missing/stale source files, empty queue, unexpected schema.
- external: API outage, 401/403, rate limit, deploy provider issue.
- logic: script exits zero but produces no expected artifact/action.
-
Close the loop.
- Fix code/config if local and reversible.
- Add a dry-run or preflight mode if the job cannot be safely tested live.
- Update the relevant STATUS/runbook/memory with evidence.
- If unresolved, record blocker, owner, next command, and alert threshold.
Verification Evidence
Every cron fix needs at least one of:
- Manual reproduction command with exit code and expected output.
- preflight-only or dry-run output proving dependencies are healthy.
- Scheduler log excerpt showing the next run succeeded.
- A deliberate deferred/blocked entry with owner, reason, and next check date.
Dream-Cycle Specific Checks
For dream-cycle failures:
- bash -n scripts/dream-cycle.sh
- python3 -m py_compile for every Python script touched by the cycle.
- scripts/task-quality-judge.py --since 7 --dry-run
- scripts/skill-evolver.py --since 7 --min-failures 2 --dry-run
- scripts/dream-recurring-issues.py --since 7 --min-count 3 --dry-run
- scripts/dream-cycle-action-summary.py --since-hours 26 --dry-run
Do not mark dream-cycle work complete if proposal files are merely pending. There must be a lifecycle status, a summary, and a next action.
- Make sure OpenClaw is installed (local or Docker)
- Run the install command in chat:
/install cron-failure-runbook - After installation, invoke the skill by name or use
/cron-failure-runbook - Provide required inputs per the skill's parameter spec and get structured output
What is Cron Failure Runbook?
Runbook for diagnosing failed cron jobs, LaunchAgents, heartbeats, and unattended automation by reproducing the scheduler context, preflighting dependencies,... It is an AI Agent Skill for Claude Code / OpenClaw, with 47 downloads so far.
How do I install Cron Failure Runbook?
Run "/install cron-failure-runbook" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.
Is Cron Failure Runbook free?
Yes, Cron Failure Runbook is completely free, licensed under MIT-0. You can download, install and use it at no cost.
Which platforms does Cron Failure Runbook support?
Cron Failure Runbook is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).
Who created Cron Failure Runbook?
It is built and maintained by Nissan Dookeran (@nissan); the current version is v1.0.0.