Compaction Algorithm: Trigger Formula, Pre-Flush Mechanism and Long-Session Information Preservation
Chapter 27 Compaction Algorithm: Trigger Formula, Pre-flush Mechanism, and Long-session Information Preservation
"Compaction is not forgetting — it is a controlled knowledge handoff." — OpenClaw Engineering Log
27.1 Why Is Compaction Necessary?
The Context Window is an LLM's "workbench" — all currently visible information must fit within this finite space. Even a large 200K-token Context Window can fill up during deep work sessions lasting several hours.
Consider a typical scenario:
9:00 AM: Start a complex code refactoring task
Each conversation turn consumes ~2K tokens (question + code + answer)
By 2:00 PM, conversation history alone has consumed ~60K tokens
Add tool call results (each file read: ~5-20K tokens): now over 150K tokens
Without intervention, the Context Window eventually overflows, causing subsequent inference calls to fail or truncate.
Compaction is OpenClaw's systematic answer to this problem: by summarizing old conversation turns and preserving recent context, it frees up space for new information without interrupting the workflow.
The key challenge: Compaction is a lossy operation — the details of old messages cannot be fully preserved in a summary. OpenClaw's innovation lies in executing a Memory Pre-flush before compression, actively persisting the most valuable information to disk. This transforms "lossy compression" into a "controlled knowledge handoff."
27.2 The Trigger Formula in Detail
27.2.1 Core Trigger Formula
$$ \text{Trigger condition:} \quad currentTokens \geq contextWindow - reserveTokensFloor - softThresholdTokens $$
Using default parameters:
$$ \text{Activation threshold} = 200{,}000 - 20{,}000 - 4{,}000 = 176{,}000 \text{ tokens} $$
When currentTokens >= 176,000, the Compaction process begins.
27.2.2 Parameter Meanings
| Parameter | Default | Meaning |
|---|---|---|
contextWindow |
200,000 | Model's Context Window limit (depends on model) |
reserveTokensFloor |
20,000 | Hard reserve: space for Compaction output (summary generation), ensuring LLM has room to generate responses |
softThresholdTokens |
4,000 | Soft buffer: advance headroom before triggering Compaction, giving time for the Pre-flush process |
27.2.3 Parameter Configuration
# ~/.openclaw/config.yaml
compaction:
contextWindow: 200000 # Model context limit
reserveTokensFloor: 20000 # Hard reserve floor
softThresholdTokens: 4000 # Soft trigger advance
# Result: activation threshold = 176,000 tokens
For deployments using different models, adjust contextWindow accordingly:
# Using Claude 3.5 Sonnet (100K context)
compaction:
contextWindow: 100000
reserveTokensFloor: 15000
softThresholdTokens: 3000
# Activation threshold = 82,000 tokens
27.2.4 Token Counting Method
Token counting includes all content in the Context Window:
currentTokens =
system_prompt_tokens // AGENTS.md + SOUL.md + USER.md
+ memory_tokens // MEMORY.md (if primary session)
+ daily_log_tokens // today + yesterday logs
+ session_history_tokens // conversation history
+ tool_result_tokens // tool call results (already injected)
+ retrieved_chunks_tokens // vector retrieval results (already injected)
27.3 Pre-compaction Memory Flush: The Core Innovation
27.3.1 The Problem with Traditional Compaction
The Compaction flow in traditional LLM applications typically looks like:
Context full detected → Generate summary → Replace old messages with summary → Continue working
This process is purely lossy: when the LLM generates the summary, it decides what to keep based on its in-context judgment at that moment. Many details, intermediate decisions, and important tool call results are permanently lost in this process.
27.3.2 OpenClaw's Pre-flush Innovation
OpenClaw inserts an additional step before Compaction:
Soft threshold detected (currentTokens >= 176K)
↓
Send Memory Flush Prompt (silent agentic turn, invisible to user)
↓
Agent decides what content is worth persisting
↓
Write to memory/YYYY-MM-DD.md (Daily Logs)
↓
Compaction proceeds: old messages summarized, recent messages retained
The core value of this flow: letting the agent itself decide what information is worth saving, rather than leaving it to a mechanical summarization algorithm.
27.3.3 Memory Flush Prompt Content
The system sends the agent a special internal prompt (invisible to the user):
[SYSTEM - INTERNAL FLUSH PROMPT]
The Context Window is about to trigger Compaction. Before compression,
please review the current conversation history and identify and persist
the following types of information:
1. Preferences or constraints explicitly stated by the user
2. Important decisions made and the reasons behind them
3. Important facts discovered (about the codebase, project, user environment, etc.)
4. Current task progress (what has been done, what still needs doing)
5. Any information that may be useful in future sessions
Use the write_file tool to append this information to today's log.
If there is no new information worth saving, output "NOTHING_TO_FLUSH".
27.3.4 Example of Agent Flush Decision
# Example of content written by the agent during the Flush phase
## 14:23 — Compaction Pre-flush
**Task Progress:**
- Refactoring JWT validation logic in auth-service
- Completed: token generation function (generateToken.ts), validation function (verifyToken.ts)
- Remaining: refresh token logic (refreshToken.ts), test cases
**Important Decisions:**
- Decided to use RS256 rather than HS256: supports public key verification, appropriate for microservices
- JWT expiration set to 15 minutes (access token), 7 days (refresh token)
**Issues Found:**
- Old code has `secret` variable hardcoded in auth.js (line 42); needs migration to env vars
**User Preferences:**
- User wants JSDoc comments on all new functions
27.3.5 How the Silent Agentic Turn Works
A "silent agentic turn" means this Flush process is completely invisible to the user:
User's perspective (chat interface):
[User message] → [Agent reply] → [User message] → [Agent reply]
↑ Flush quietly happens here
Internal system perspective:
[User message] → [Flush Prompt (internal)] → [Agent Flush operation (write file)]
→ [Compaction]
→ [Agent reply (responding to original user message)]
The user only notices a slightly longer response delay; no Flush-related output is visible.
27.4 memoryFlushCompactionCount: The Anti-duplication Mechanism
27.4.1 The Problem
Without an anti-duplication mechanism, the following scenario creates issues:
currentTokens = 176,001 (exceeds threshold)
→ Trigger Flush + Compaction
→ After Compaction, currentTokens drops to 120,000
→ Conversation continues...
→ currentTokens grows again to 176,001
→ Trigger Flush + Compaction again ← normal behavior
But if token count remains high after Compaction:
currentTokens = 176,001
→ Trigger Flush (write to log)
→ Compaction in progress... (may fail or be delayed)
→ Next request: currentTokens = 176,005
→ Trigger Flush again → same content written twice!
27.4.2 The Solution
memoryFlushCompactionCount is a counter stored in session metadata that records which Compaction epochs have already had a Flush executed:
// session metadata
{"type":"meta","memoryFlushCompactionCount":3,"lastCompactionAt":"2026-04-26T14:23:00Z"}
The Flush trigger logic:
const currentEpoch = compactionCount; // current Compaction epoch
if (currentTokens >= activationThreshold) {
if (memoryFlushCompactionCount < currentEpoch) {
// Current epoch has not been flushed yet; execute Flush
await performMemoryFlush();
memoryFlushCompactionCount = currentEpoch;
}
// Execute Compaction regardless of whether Flush occurred
await performCompaction();
}
This ensures that within the same Compaction epoch, no matter how many times the trigger condition is detected, Flush executes only once.
27.5 Dreaming: The Background Consolidation Process
27.5.1 What Dreaming Does
Daily Logs may accumulate a large volume of fragmented records over time. Dreaming is a background process that periodically (or on trigger) reviews Daily Logs, distills information with long-term value, and promotes it to MEMORY.md.
Analogy: if Daily Logs are the "field notebook," MEMORY.md is the "knowledge map," and Dreaming is the process of organizing notes each evening.
27.5.2 Dreaming Trigger Timing
dreaming:
triggers:
- type: schedule
cron: "0 3 * * *" # Runs daily at 3:00 AM
- type: session_idle
idle_minutes: 30 # Runs after 30 minutes of session inactivity
- type: daily_log_size
threshold_kb: 50 # Triggers when today's log exceeds 50KB
27.5.3 Dreaming Workflow
1. Read Daily Logs from the past N days (default: 7 days)
2. Read current MEMORY.md
3. Generate consolidation prompt:
"Below are recent work logs. Please identify information with long-term value,
supplement or update MEMORY.md, and avoid duplicating existing content."
4. Agent generates MEMORY.md updates
5. Write to MEMORY.md
6. Update vector index
27.5.4 Grounded Backfill and DREAMS.md
In an extended Dreaming mode, the agent can replay historical session records to extract important information that was missed in the past:
Historical JSONL files → Replay analysis → Identify missed important info → Stage in DREAMS.md
↓
User review/confirmation
↓
Promote to MEMORY.md
DREAMS.md is a staging file for "candidate long-term memories." Unlike writing directly to MEMORY.md, content in DREAMS.md requires review before promotion. This reduces the risk of accidentally writing noise into long-term memory.
# DREAMS.md Example
## Candidate Memory Items (Pending Review)
### [2026-04-15 Session Replay]
Finding: User mentioned that their team has an API spec document at /docs/api-spec.yaml
Suggested action: Add this path to the project resources index in MEMORY.md
### [2026-04-20 Session Replay]
Finding: User dislikes emoji in code comments
Suggested action: Update the user preferences section in MEMORY.md
27.6 Compaction vs. Pruning: Differences and Collaboration
27.6.1 Definitions of the Two Mechanisms
Pruning:
- Trims tool call results in memory only
- Before each request, removes tool results older than N turns from Context
- Does not write to disk — after the session ends, pruned content still exists in the original JSONL
- Purpose: reduce token consumption while preserving the conversation messages themselves
Compaction:
- Writes to disk (updates the JSONL file)
- Replaces old conversation turns with a summary
- Permanently changes the stored structure of session history
- Purpose: fundamentally reduce the space occupied by history
27.6.2 Execution Order and Collaboration
Before each API request:
├── Step 1: Pruning (in-memory operation, runs first)
│ └── Remove tool results older than N turns (affects only this request's Context)
├── Step 2: Token counting
│ └── Calculate currentTokens after Pruning
└── Step 3: Determine whether to trigger Compaction
├── if currentTokens >= activationThreshold:
│ ├── Pre-flush (if current epoch not yet flushed)
│ └── Compaction (summary + write to JSONL)
└── else: proceed with normal inference request
27.6.3 Comparison Table
| Dimension | Pruning | Compaction |
|---|---|---|
| Scope | Memory (current request only) | Disk (permanent) |
| Target | Tool call results | Old conversation turns |
| Reversibility | Reversible (original JSONL unchanged) | Irreversible (JSONL is modified) |
| Trigger frequency | Before every request | Only when threshold is triggered |
| Information loss | None (only hidden) | Yes (details replaced by summary) |
| Pre-flush involved | No | Yes |
27.7 Compaction Behavior in Sandbox Read-only Mode
When an agent runs in Docker sandbox read-only mode, Compaction behavior changes:
27.7.1 Read-only Mode Restrictions
In a read-only sandbox, the agent cannot write any files. This means:
- Memory Pre-flush cannot execute (cannot write to Daily Logs)
- Compaction's JSONL write step is skipped
- Only Pruning (in-memory operation) can execute normally
27.7.2 Degraded Behavior
if (sandbox.isReadOnly) {
// Skip Pre-flush
logger.info("Sandbox read-only mode: skipping memory flush");
// Skip Compaction (cannot write JSONL)
logger.info("Sandbox read-only mode: skipping compaction");
// Execute Pruning only
await performPruning();
// If still above threshold after Pruning, emit warning
if (currentTokens >= activationThreshold) {
logger.warn("Context Window pressure high in read-only sandbox; consider increasing contextWindow");
}
}
27.7.3 Practical Recommendations
When running long sessions in a read-only sandbox:
- Increase the
contextWindowconfig value (if the model supports it) - Reduce task granularity (split long tasks into multiple shorter tasks)
- Enable more aggressive Pruning of tool results (lower
pruneAfterRounds)
sandbox:
readOnly: true
compaction:
# Special config for read-only mode
pruneAfterRounds: 3 # More aggressive tool result pruning (default: 10)
warnAtTokenThreshold: 150000 # Warn earlier
27.8 Full Compaction Flow Sequence Diagram
User message arrives
│
▼
Token counter updated
│
▼
currentTokens >= activationThreshold?
│
├── No → Normal inference request ──────────────────────────→ Return reply
│
└── Yes
│
▼
Read-only sandbox?
│
├── Yes → Pruning only → Inference request → Return reply
│
└── No
│
▼
memoryFlushCompactionCount < currentEpoch?
│
├── No (already flushed) → Skip Pre-flush
│
└── Yes (not yet flushed)
│
▼
Send Flush Prompt (silent agentic turn)
│
▼
Agent writes to Daily Logs
│
▼
memoryFlushCompactionCount++
│
▼
Execute Compaction
│
├── Generate old message summary (LLM call)
├── Replace old messages with summary
└── Write updated JSONL
│
▼
Update vector index (async)
│
▼
Inference request (with compacted Context)
│
▼
Return reply to user
27.9 Evaluating Compaction Quality
27.9.1 How to Judge Compaction Quality?
A high-quality Compaction summary should satisfy:
| Quality Indicator | Assessment Method |
|---|---|
| Task continuity | After Compaction, can the agent continue completing unfinished tasks? |
| Decision preservation | Are important design decisions reflected in the summary? |
| Context awareness | Does the agent "know" what operations have already been performed and avoid repeating them? |
| Conciseness | Is the summary significantly smaller than the original conversation (typically 5:1 compression ratio or higher)? |
27.9.2 Compaction Summary Example
Original conversation (~30K tokens) → Compressed summary (~2K tokens):
[COMPACTED SUMMARY - as of 14:22]
**Task context:** Refactoring JWT authentication in auth-service
**Work completed:**
- Implemented generateToken(payload, expiresIn) — uses RS256, returns {token, expiresAt}
- Implemented verifyToken(token) — validates signature, checks expiration, returns payload or null
- Found and logged: hardcoded secret in old auth.js line 42 (user has been notified)
**Current state:** Currently implementing refreshToken logic
**Remaining:** refreshToken.ts implementation, complete test suite
**Technical decisions:** RS256 (asymmetric encryption), 15-min access token, 7-day refresh token
27.10 Chapter Summary
OpenClaw's Compaction mechanism transforms the fundamental constraint of a finite Context Window into a manageable engineering problem:
- The trigger formula ensures timely intervention before Context is exhausted: $currentTokens \geq contextWindow - 20K - 4K$
- The Pre-flush mechanism performs a knowledge handoff before compression, turning a lossy operation into a controlled one
- memoryFlushCompactionCount prevents duplicate flushes within the same epoch
- The Dreaming process consolidates fragmented Daily Log knowledge into long-term MEMORY.md entries
- Pruning and Compaction collaborate, handling "hiding" and "compressing" as distinct space optimization strategies
- In read-only sandboxes, the system degrades gracefully, executing only in-memory Pruning
Understanding the Compaction mechanism is an essential foundation for designing OpenClaw agents that run stably over the long term.
Next: Chapter 28 — Vector Retrieval Implementation: SQLite + BM25 Hybrid Search, 0.7/0.3 Weight Fusion, and the Embedding Fallback Chain