Chapter 27

Compaction Algorithm: Trigger Formula, Pre-Flush Mechanism and Long-Session Information Preservation

Chapter 27 Compaction Algorithm: Trigger Formula, Pre-flush Mechanism, and Long-session Information Preservation

"Compaction is not forgetting — it is a controlled knowledge handoff." — OpenClaw Engineering Log


27.1 Why Is Compaction Necessary?

The Context Window is an LLM's "workbench" — all currently visible information must fit within this finite space. Even a large 200K-token Context Window can fill up during deep work sessions lasting several hours.

Consider a typical scenario:

9:00 AM: Start a complex code refactoring task
Each conversation turn consumes ~2K tokens (question + code + answer)
By 2:00 PM, conversation history alone has consumed ~60K tokens
Add tool call results (each file read: ~5-20K tokens): now over 150K tokens

Without intervention, the Context Window eventually overflows, causing subsequent inference calls to fail or truncate.

Compaction is OpenClaw's systematic answer to this problem: by summarizing old conversation turns and preserving recent context, it frees up space for new information without interrupting the workflow.

The key challenge: Compaction is a lossy operation — the details of old messages cannot be fully preserved in a summary. OpenClaw's innovation lies in executing a Memory Pre-flush before compression, actively persisting the most valuable information to disk. This transforms "lossy compression" into a "controlled knowledge handoff."


27.2 The Trigger Formula in Detail

27.2.1 Core Trigger Formula

$$ \text{Trigger condition:} \quad currentTokens \geq contextWindow - reserveTokensFloor - softThresholdTokens $$

Using default parameters:

$$ \text{Activation threshold} = 200{,}000 - 20{,}000 - 4{,}000 = 176{,}000 \text{ tokens} $$

When currentTokens >= 176,000, the Compaction process begins.

27.2.2 Parameter Meanings

Parameter Default Meaning
contextWindow 200,000 Model's Context Window limit (depends on model)
reserveTokensFloor 20,000 Hard reserve: space for Compaction output (summary generation), ensuring LLM has room to generate responses
softThresholdTokens 4,000 Soft buffer: advance headroom before triggering Compaction, giving time for the Pre-flush process

27.2.3 Parameter Configuration

# ~/.openclaw/config.yaml
compaction:
  contextWindow: 200000        # Model context limit
  reserveTokensFloor: 20000    # Hard reserve floor
  softThresholdTokens: 4000    # Soft trigger advance
  # Result: activation threshold = 176,000 tokens

For deployments using different models, adjust contextWindow accordingly:

# Using Claude 3.5 Sonnet (100K context)
compaction:
  contextWindow: 100000
  reserveTokensFloor: 15000
  softThresholdTokens: 3000
  # Activation threshold = 82,000 tokens

27.2.4 Token Counting Method

Token counting includes all content in the Context Window:

currentTokens = 
    system_prompt_tokens        // AGENTS.md + SOUL.md + USER.md
  + memory_tokens               // MEMORY.md (if primary session)
  + daily_log_tokens            // today + yesterday logs
  + session_history_tokens      // conversation history
  + tool_result_tokens          // tool call results (already injected)
  + retrieved_chunks_tokens     // vector retrieval results (already injected)

27.3 Pre-compaction Memory Flush: The Core Innovation

27.3.1 The Problem with Traditional Compaction

The Compaction flow in traditional LLM applications typically looks like:

Context full detected → Generate summary → Replace old messages with summary → Continue working

This process is purely lossy: when the LLM generates the summary, it decides what to keep based on its in-context judgment at that moment. Many details, intermediate decisions, and important tool call results are permanently lost in this process.

27.3.2 OpenClaw's Pre-flush Innovation

OpenClaw inserts an additional step before Compaction:

Soft threshold detected (currentTokens >= 176K)
    ↓
Send Memory Flush Prompt (silent agentic turn, invisible to user)
    ↓
Agent decides what content is worth persisting
    ↓
Write to memory/YYYY-MM-DD.md (Daily Logs)
    ↓
Compaction proceeds: old messages summarized, recent messages retained

The core value of this flow: letting the agent itself decide what information is worth saving, rather than leaving it to a mechanical summarization algorithm.

27.3.3 Memory Flush Prompt Content

The system sends the agent a special internal prompt (invisible to the user):

[SYSTEM - INTERNAL FLUSH PROMPT]
The Context Window is about to trigger Compaction. Before compression,
please review the current conversation history and identify and persist
the following types of information:

1. Preferences or constraints explicitly stated by the user
2. Important decisions made and the reasons behind them
3. Important facts discovered (about the codebase, project, user environment, etc.)
4. Current task progress (what has been done, what still needs doing)
5. Any information that may be useful in future sessions

Use the write_file tool to append this information to today's log.
If there is no new information worth saving, output "NOTHING_TO_FLUSH".

27.3.4 Example of Agent Flush Decision

# Example of content written by the agent during the Flush phase

## 14:23 — Compaction Pre-flush

**Task Progress:**
- Refactoring JWT validation logic in auth-service
- Completed: token generation function (generateToken.ts), validation function (verifyToken.ts)
- Remaining: refresh token logic (refreshToken.ts), test cases

**Important Decisions:**
- Decided to use RS256 rather than HS256: supports public key verification, appropriate for microservices
- JWT expiration set to 15 minutes (access token), 7 days (refresh token)

**Issues Found:**
- Old code has `secret` variable hardcoded in auth.js (line 42); needs migration to env vars

**User Preferences:**
- User wants JSDoc comments on all new functions

27.3.5 How the Silent Agentic Turn Works

A "silent agentic turn" means this Flush process is completely invisible to the user:

User's perspective (chat interface):
[User message] → [Agent reply] → [User message] → [Agent reply]
                                                   ↑ Flush quietly happens here

Internal system perspective:
[User message] → [Flush Prompt (internal)] → [Agent Flush operation (write file)]
               → [Compaction]
               → [Agent reply (responding to original user message)]

The user only notices a slightly longer response delay; no Flush-related output is visible.


27.4 memoryFlushCompactionCount: The Anti-duplication Mechanism

27.4.1 The Problem

Without an anti-duplication mechanism, the following scenario creates issues:

currentTokens = 176,001 (exceeds threshold)
→ Trigger Flush + Compaction
→ After Compaction, currentTokens drops to 120,000
→ Conversation continues...
→ currentTokens grows again to 176,001
→ Trigger Flush + Compaction again ← normal behavior

But if token count remains high after Compaction:

currentTokens = 176,001
→ Trigger Flush (write to log)
→ Compaction in progress... (may fail or be delayed)
→ Next request: currentTokens = 176,005
→ Trigger Flush again → same content written twice!

27.4.2 The Solution

memoryFlushCompactionCount is a counter stored in session metadata that records which Compaction epochs have already had a Flush executed:

// session metadata
{"type":"meta","memoryFlushCompactionCount":3,"lastCompactionAt":"2026-04-26T14:23:00Z"}

The Flush trigger logic:

const currentEpoch = compactionCount;  // current Compaction epoch

if (currentTokens >= activationThreshold) {
    if (memoryFlushCompactionCount < currentEpoch) {
        // Current epoch has not been flushed yet; execute Flush
        await performMemoryFlush();
        memoryFlushCompactionCount = currentEpoch;
    }
    // Execute Compaction regardless of whether Flush occurred
    await performCompaction();
}

This ensures that within the same Compaction epoch, no matter how many times the trigger condition is detected, Flush executes only once.


27.5 Dreaming: The Background Consolidation Process

27.5.1 What Dreaming Does

Daily Logs may accumulate a large volume of fragmented records over time. Dreaming is a background process that periodically (or on trigger) reviews Daily Logs, distills information with long-term value, and promotes it to MEMORY.md.

Analogy: if Daily Logs are the "field notebook," MEMORY.md is the "knowledge map," and Dreaming is the process of organizing notes each evening.

27.5.2 Dreaming Trigger Timing

dreaming:
  triggers:
    - type: schedule
      cron: "0 3 * * *"     # Runs daily at 3:00 AM
    - type: session_idle
      idle_minutes: 30       # Runs after 30 minutes of session inactivity
    - type: daily_log_size
      threshold_kb: 50       # Triggers when today's log exceeds 50KB

27.5.3 Dreaming Workflow

1. Read Daily Logs from the past N days (default: 7 days)
2. Read current MEMORY.md
3. Generate consolidation prompt:
   "Below are recent work logs. Please identify information with long-term value,
    supplement or update MEMORY.md, and avoid duplicating existing content."
4. Agent generates MEMORY.md updates
5. Write to MEMORY.md
6. Update vector index

27.5.4 Grounded Backfill and DREAMS.md

In an extended Dreaming mode, the agent can replay historical session records to extract important information that was missed in the past:

Historical JSONL files → Replay analysis → Identify missed important info → Stage in DREAMS.md
                                                                              ↓
                                                                    User review/confirmation
                                                                              ↓
                                                                    Promote to MEMORY.md

DREAMS.md is a staging file for "candidate long-term memories." Unlike writing directly to MEMORY.md, content in DREAMS.md requires review before promotion. This reduces the risk of accidentally writing noise into long-term memory.

# DREAMS.md Example

## Candidate Memory Items (Pending Review)

### [2026-04-15 Session Replay]
Finding: User mentioned that their team has an API spec document at /docs/api-spec.yaml
Suggested action: Add this path to the project resources index in MEMORY.md

### [2026-04-20 Session Replay]
Finding: User dislikes emoji in code comments
Suggested action: Update the user preferences section in MEMORY.md

27.6 Compaction vs. Pruning: Differences and Collaboration

27.6.1 Definitions of the Two Mechanisms

Pruning:

Compaction:

27.6.2 Execution Order and Collaboration

Before each API request:
├── Step 1: Pruning (in-memory operation, runs first)
│   └── Remove tool results older than N turns (affects only this request's Context)
├── Step 2: Token counting
│   └── Calculate currentTokens after Pruning
└── Step 3: Determine whether to trigger Compaction
    ├── if currentTokens >= activationThreshold:
    │   ├── Pre-flush (if current epoch not yet flushed)
    │   └── Compaction (summary + write to JSONL)
    └── else: proceed with normal inference request

27.6.3 Comparison Table

Dimension Pruning Compaction
Scope Memory (current request only) Disk (permanent)
Target Tool call results Old conversation turns
Reversibility Reversible (original JSONL unchanged) Irreversible (JSONL is modified)
Trigger frequency Before every request Only when threshold is triggered
Information loss None (only hidden) Yes (details replaced by summary)
Pre-flush involved No Yes

27.7 Compaction Behavior in Sandbox Read-only Mode

When an agent runs in Docker sandbox read-only mode, Compaction behavior changes:

27.7.1 Read-only Mode Restrictions

In a read-only sandbox, the agent cannot write any files. This means:

27.7.2 Degraded Behavior

if (sandbox.isReadOnly) {
    // Skip Pre-flush
    logger.info("Sandbox read-only mode: skipping memory flush");
    
    // Skip Compaction (cannot write JSONL)
    logger.info("Sandbox read-only mode: skipping compaction");
    
    // Execute Pruning only
    await performPruning();
    
    // If still above threshold after Pruning, emit warning
    if (currentTokens >= activationThreshold) {
        logger.warn("Context Window pressure high in read-only sandbox; consider increasing contextWindow");
    }
}

27.7.3 Practical Recommendations

When running long sessions in a read-only sandbox:

  1. Increase the contextWindow config value (if the model supports it)
  2. Reduce task granularity (split long tasks into multiple shorter tasks)
  3. Enable more aggressive Pruning of tool results (lower pruneAfterRounds)
sandbox:
  readOnly: true
  compaction:
    # Special config for read-only mode
    pruneAfterRounds: 3          # More aggressive tool result pruning (default: 10)
    warnAtTokenThreshold: 150000  # Warn earlier

27.8 Full Compaction Flow Sequence Diagram

User message arrives
    │
    ▼
Token counter updated
    │
    ▼
currentTokens >= activationThreshold?
    │
    ├── No → Normal inference request ──────────────────────────→ Return reply
    │
    └── Yes
            │
            ▼
        Read-only sandbox?
            │
            ├── Yes → Pruning only → Inference request → Return reply
            │
            └── No
                    │
                    ▼
                memoryFlushCompactionCount < currentEpoch?
                    │
                    ├── No (already flushed) → Skip Pre-flush
                    │
                    └── Yes (not yet flushed)
                                │
                                ▼
                            Send Flush Prompt (silent agentic turn)
                                │
                                ▼
                            Agent writes to Daily Logs
                                │
                                ▼
                            memoryFlushCompactionCount++
                    │
                    ▼
                Execute Compaction
                    │
                    ├── Generate old message summary (LLM call)
                    ├── Replace old messages with summary
                    └── Write updated JSONL
                    │
                    ▼
                Update vector index (async)
                    │
                    ▼
                Inference request (with compacted Context)
                    │
                    ▼
                Return reply to user

27.9 Evaluating Compaction Quality

27.9.1 How to Judge Compaction Quality?

A high-quality Compaction summary should satisfy:

Quality Indicator Assessment Method
Task continuity After Compaction, can the agent continue completing unfinished tasks?
Decision preservation Are important design decisions reflected in the summary?
Context awareness Does the agent "know" what operations have already been performed and avoid repeating them?
Conciseness Is the summary significantly smaller than the original conversation (typically 5:1 compression ratio or higher)?

27.9.2 Compaction Summary Example

Original conversation (~30K tokens) → Compressed summary (~2K tokens):

[COMPACTED SUMMARY - as of 14:22]

**Task context:** Refactoring JWT authentication in auth-service
**Work completed:**
- Implemented generateToken(payload, expiresIn) — uses RS256, returns {token, expiresAt}
- Implemented verifyToken(token) — validates signature, checks expiration, returns payload or null
- Found and logged: hardcoded secret in old auth.js line 42 (user has been notified)

**Current state:** Currently implementing refreshToken logic
**Remaining:** refreshToken.ts implementation, complete test suite

**Technical decisions:** RS256 (asymmetric encryption), 15-min access token, 7-day refresh token

27.10 Chapter Summary

OpenClaw's Compaction mechanism transforms the fundamental constraint of a finite Context Window into a manageable engineering problem:

Understanding the Compaction mechanism is an essential foundation for designing OpenClaw agents that run stably over the long term.


Next: Chapter 28 — Vector Retrieval Implementation: SQLite + BM25 Hybrid Search, 0.7/0.3 Weight Fusion, and the Embedding Fallback Chain

Rate this chapter
4.8  / 5  (4 ratings)

💬 Comments