Description

Systematically review and iteratively refine your response for logic, accuracy, completeness, conciseness, actionability, and consistency before delivering.

README (SKILL.md)

Self-Refine Reflection Skill

Name: Self-Refine Reflection
Author: thomaszhou22

An AI agent skill for systematic self-reflection and iterative output refinement. Based on Madaan et al. (2023), Shinn et al. (2023), and Andrew Ng's Reflection design pattern.

Platform Auto-Detection

At skill load time, detect your runtime environment and adjust capabilities:

Capability	How to Check	Fallback
File system	Can you read `references/reflection-templates.md`?	Use the inline templates below instead
Persistent memory	Can you write to `memory/`?	Store reflection notes in conversation context only
Long context	Is your context window > 32K tokens?	Cap at Level 2 (skip adversarial review)
Tool access	Can you call external tools?	Use mental verification only

Detection rules:

If you can read this file's references/ directory → full mode (all levels + memory)
If you can read files but not write → full levels, in-conversation memory only
If you cannot read files at all → use inline templates (copied below), cap at Level 2
If context is limited (\x3C 8K usable) → default to Level 1, max Level 2

This means every platform gets the best possible experience automatically — no manual configuration needed.

Inline Reflection Templates (for environments without file access)

If you cannot read references/reflection-templates.md, use these directly:

Level 1 Internal Prompt

Review drafted response: (1) Did I answer everything? (2) Logic gaps? (3) Can I cut 20%? (4) User can act on this? Fix → Deliver.

Level 2 Internal Prompt

For each dimension (Logic / Facts / Completeness / Conciseness / Actionability / Consistency):
  - Quote exact sentences with issues
  - Rate: Critical / Minor / Pass
Fix ALL Critical. Fix Minor if straightforward. Re-read once. Deliver.

Level 3 Internal Prompt

Round 1: Level 2 review.
Round 2: "If a domain expert attacked this, what would they target?" Fix valid attacks.
Round 3: Did fixes introduce new issues? If stable → deliver. If not → fix and deliver.

When to Activate

Activate when you are about to deliver a final response to the user, after you've gathered all information and formed your answer. Reflection happens before delivery, not instead of work.

Core Loop: GENERATE → CRITIQUE → REFINE → CHECK

1. GENERATE  — Produce your initial response as normal
2. CRITIQUE  — Apply the reflection framework (depth-dependent)
3. REFINE    — Fix every issue found in CRITIQUE
4. CHECK     — If issues remain AND within budget, loop to step 2
              — If clear OR budget exhausted, deliver

Source: Madaan et al., "Self-Refine: Iterative Refinement with Self-Feedback" (2023) — the same LLM acts as generator, critic, and refiner.

Reflection Dimensions

Every critique examines the response across these dimensions. Not all apply to every response — skip irrelevant ones silently.

#	Dimension	What to Check	Signal Words
1	Logical Completeness	Does the reasoning chain have gaps? Are conclusions supported by premises?	"therefore" without prior evidence, jumps in logic
2	Factual Accuracy	Are there unverified assertions? Claims that could be wrong?	Specific numbers, dates, names, "everyone knows..."
3	Response Completeness	Did I address every part of the user's question? Are there sub-questions I ignored?	Multiple questions in user message, implicit needs
4	Conciseness	Is there redundancy? Can paragraphs be merged? Are there filler phrases?	"It's important to note that", "In conclusion", repeated points
5	Actionability	Can the user act on this immediately? Or do they need to ask follow-ups?	Vague advice without steps, missing specifics
6	Internal Consistency	Do any parts of my response contradict each other?	Conflicting recommendations, contradictory statements

Reflection Depth Levels

The depth is determined by task complexity, not user preference. The agent auto-selects.

Level 0: Quick Scan (Skip formal reflection)

Trigger: Simple factual lookups, greetings, trivial questions, single-sentence answers.

Action: Do nothing extra. Just respond. Cost: 0 additional tokens.

Examples: "What time is it?", "Thanks", simple formatting requests.

Level 1: Standard Review

Trigger: Medium-complexity tasks — explanations, how-to guides, code snippets, multi-paragraph responses.

Action: One pass through all 6 dimensions mentally. Fix issues. Deliver.

Budget: 1 refinement round. ~15-20% overhead on response tokens.

Internal process:

After drafting your response, ask yourself:
- Did I answer everything they asked?
- Any logical gaps or contradictions?
- Can I cut 20% of the words without losing meaning?
- Would the user know exactly what to do next?
Fix → Deliver.

Level 2: Deep Audit

Trigger: Complex tasks — technical architectures, multi-step plans, research summaries, anything with 5+ distinct claims.

Action: Explicit dimension-by-dimension review. One full refinement round with documented issues.

Budget: Up to 2 refinement rounds. ~30% overhead.

Internal process:

Draft response.
For each dimension (1-6):
  - Identify specific issues (quote the exact sentence)
  - Rate severity: Critical / Minor / Pass
If any Critical: refine entire response, then re-check
If only Minor: fix inline, deliver

Level 3: Adversarial Review

Trigger: High-stakes tasks — production code, security decisions, medical/legal adjacent, public-facing content, or user explicitly requests thorough review.

Action: Full audit PLUS an adversarial pass where you actively try to find the weakest point in your response.

Budget: Up to 3 refinement rounds. ~50% overhead.

Internal process:

Draft response.
Round 1: Standard dimension-by-dimension review.
Round 2: Adversarial pass — "If I wanted to prove this response wrong, 
         what would I attack?" Check those attack vectors.
Round 3 (if needed): Verify fixes from Round 2 didn't introduce new issues.

Convergence Rules

Based on the empirical finding from Madaan et al. that "most gains are in the initial iterations":

Maximum 3 refinement rounds regardless of depth level.
Stop early if no Critical or Minor issues found in a round.
Stop early if the changes between rounds are purely stylistic (no substantive improvement).
Diminishing returns rule: If Round N fixes fewer issues than Round N-1, stop after N.

Anti-pattern — DO NOT:

Refine forever trying to reach perfection
Make changes just to justify another round
Re-introduce previously fixed issues

Cost Control Strategy

Depth	Max Rounds	Approx. Token Overhead	When to Use
Level 0	0	0%	Simple Q&A
Level 1	1	~15%	Most conversations
Level 2	2	~30%	Complex technical
Level 3	3	~50%	High-stakes only

Principle: Reflection should cost less than the cost of delivering a wrong answer. For low-stakes responses, skip reflection entirely.

Trigger Conditions Summary

Auto-Trigger (always on)

Response exceeds 3 paragraphs → at least Level 1
Response makes 3+ factual claims → at least Level 1
Response includes code → at least Level 1 (check logic + actionability)

Skip Reflection

User is in a hurry (explicit: "quick", "brief", "just tell me")
Response is under 2 sentences
Pure social/chat exchange

User Manual Trigger

User says "think carefully", "double-check", "make sure this is right" → Level 2+
User says "this is important", "critical", "production" → Level 3
User says "reflect" or "self-refine" → Level 2+

Output Format

Reflection is internal — the user should not see the raw critique. However:

After refinement, you MAY append a subtle note:

Level 1: No note (keep it invisible). Level 2: Optionally: _(response refined via self-review)_ Level 3: Optionally: _(refined through [N]-round adversarial self-review)_

NEVER:

Show the actual critique/feedback text to the user
Make the note prominent or distracting
Add notes for Level 0 or Level 1 responses

Reflexion Memory (Cross-Session Learning)

Inspired by Shinn et al. (2023) — Reflexion stores verbal self-reflections in persistent memory for future tasks.

When to use: After completing a Level 2 or Level 3 reflection where you discovered a significant pattern in your own errors (e.g., "I tend to forget edge cases in X", "I consistently over-explain Y").

How to use: Write a brief note to memory/ capturing the pattern:

Self-reflection: When writing about [topic], I tend to [pattern]. 
Fix: [specific behavior change]. 
Date: [today].

This creates a growing corpus of self-corrections that improve future responses.

Relationship to Other Reflection Techniques

This skill synthesizes multiple academic approaches:

Technique	Source	What We Borrow
Self-Refine	Madaan et al. 2023	Core GENERATE→CRITIQUE→REFINE loop
Reflexion	Shinn et al. 2023	Persistent memory of self-reflections
Chain-of-Verification	Dhuliawala et al. 2023	Verification questions for factual claims (Level 2-3)
Self-Calibration	Kadavath et al. 2022	Confidence assessment of own outputs
Andrew Ng's Reflection	Ng 2024	Design pattern: agent critiques and improves its own output iteratively
CRITIC	Gou et al. 2023	Tool-interactive critiquing (use tools to verify when possible)

Quick Reference Card

REFLECT? ──→ Simple/trivial? ──→ NO → Just respond
    │
    YES
    │
    ├─ Medium complexity → LEVEL 1 (1 round, mental check)
    ├─ Complex / multi-claim → LEVEL 2 (2 rounds, explicit review)
    └─ High-stakes / user requested → LEVEL 3 (3 rounds, adversarial)
    
For each round:
  1. Check 6 dimensions
  2. Fix all issues
  3. Converged? → Deliver
  4. Budget left? → Next round
  5. Budget exhausted → Deliver current version

Usage Guidance

Treat this review as inconclusive and rerun it in an environment where metadata.json and artifact/ can be read before installing.

Capability Tags

cryptocan-make-purchasesrequires-sensitive-credentials

Capability Assessment

ℹ Purpose & Capability

Workspace inspection failed, so purpose and capability coherence could not be confirmed from the supplied files.

ℹ Instruction Scope

SKILL.md and related artifacts were not readable due to the execution environment failure.

ℹ Install Mechanism

Install metadata and package contents could not be inspected.

ℹ Credentials

No artifact-backed environment access could be assessed.

ℹ Persistence & Privilege

No artifact-backed persistence or privilege behavior could be assessed.

Version History

v2.0.1

Added .claude-plugin registry + GEMINI.md for multi-platform support

v2.0.0

Self-Refine Reflection Skill 2.0.0 – Major update with dynamic platform adaptation and refined self-reflection process. - Introduces automatic platform detection for feature availability (file access, memory, context, tools) with graceful fallbacks. - Defines multiple reflection depth levels based on response complexity, with distinct review processes and clear trigger rules. - Includes inline reflection templates for environments without file system access. - Details six systematic critique dimensions for self-review: logical completeness, factual accuracy, completeness, conciseness, actionability, and consistency. - Adds explicit cost control strategies and early stopping rules to optimize effectiveness without waste. - Enables both auto- and user-triggered reflection, providing the best possible review for each context without manual setup.

Metadata

Slug self-refine-reflection

Version 2.0.1

License MIT-0

All-time Installs 0

Active Installs 0

Total Versions 2

Frequently Asked Questions

What is Self-Refine Reflection?

Systematically review and iteratively refine your response for logic, accuracy, completeness, conciseness, actionability, and consistency before delivering. It is an AI Agent Skill for Claude Code / OpenClaw, with 154 downloads so far.

How do I install Self-Refine Reflection?

Run "/install self-refine-reflection" in the OpenClaw or Claude Code chat to install it in one step — no extra setup required.

Is Self-Refine Reflection free?

Yes, Self-Refine Reflection is completely free, licensed under MIT-0. You can download, install and use it at no cost.

Which platforms does Self-Refine Reflection support?

Self-Refine Reflection is cross-platform and runs anywhere OpenClaw / Claude Code is available (cross-platform).

Who created Self-Refine Reflection?

It is built and maintained by Thomaszhou (@thomaszhou22); the current version is v2.0.1.

More Skills

Self-Refine Reflection

Self-Refine Reflection Skill

Platform Auto-Detection

Inline Reflection Templates (for environments without file access)

Level 1 Internal Prompt

Level 2 Internal Prompt

Level 3 Internal Prompt

When to Activate

Core Loop: GENERATE → CRITIQUE → REFINE → CHECK

Reflection Dimensions

Reflection Depth Levels

Level 0: Quick Scan (Skip formal reflection)

Level 1: Standard Review

Level 2: Deep Audit

Level 3: Adversarial Review

Convergence Rules

Cost Control Strategy

Trigger Conditions Summary

Auto-Trigger (always on)

Skip Reflection

User Manual Trigger

Output Format

After refinement, you MAY append a subtle note:

NEVER:

Reflexion Memory (Cross-Session Learning)

Relationship to Other Reflection Techniques

Quick Reference Card

What is Self-Refine Reflection?

How do I install Self-Refine Reflection?

Is Self-Refine Reflection free?

Which platforms does Self-Refine Reflection support?

Who created Self-Refine Reflection?

💬 Comments