OpenClaw guide

How OpenClaw Memory Works: A Technical Explanation

OpenClaw memory is the technical system that allows an OpenClaw agent to store, retrieve, and use information from past interactions. It operates through Markdown files, a context window, and an optional retrieval index — all within the agent's local workspace.

TL;DR

  • OpenClaw is stateless by default — every session starts with a fresh context window.
  • Memory has four layers: bootstrap files, session transcript, LLM context window, and retrieval index.
  • The memory flush (disabled by default) is the critical mechanism for persisting context across sessions.

OpenClaw memory is the technical system that allows an OpenClaw agent to store, retrieve, and use information from past interactions. It operates through Markdown files, a context window, and an optional retrieval index — all within the agent's local workspace.

Understanding how this system works is essential for diagnosing memory problems and deciding whether you need a plugin.


Why Is OpenClaw Stateless by Default?

OpenClaw agents run on LLM API calls. Each API call is independent — it receives a prompt, generates a response, and forgets everything. There is no built-in state between calls.

A "session" in OpenClaw is a sequence of these API calls sharing a context window. When the session ends, the context window is discarded. When a new session starts, a new context window is created from scratch.

This is by design. Statelessness keeps the architecture simple, predictable, and easy to reason about. But it means the agent has no inherent mechanism to remember anything from yesterday.


What Are the Four Memory Layers?

OpenClaw's memory system has four distinct layers, each operating at a different level of persistence and automation.

Layer 1: Bootstrap Files (Permanent, Manual)

Files: SOUL.md, AGENTS.md, USER.md

These Markdown files live in your workspace root and are loaded into the context window at the start of every session. They define the agent's personality (SOUL.md), operational rules (AGENTS.md), and user information (USER.md).

Persistence: Permanent — they survive compaction and session boundaries because they're reloaded from disk at every session start.

Limitation: Manual maintenance. You write and edit these files yourself. The agent doesn't update them automatically. They consume tokens from the context window.

Layer 2: Session Transcript (Temporary, Automatic)

Format: JSONL on disk

Every exchange in a session is recorded as a JSONL transcript. This is the raw log of what was said.

Persistence: Survives within a session. Lost at session end unless the memory flush is enabled.

Limitation: Not searchable by the agent unless indexed. Raw transcripts are too large to inject into the context window directly.

Layer 3: LLM Context Window (Ephemeral, Automatic)

This is the active working memory of the agent — the tokens currently loaded for the current API call. It includes bootstrap files, recent conversation turns, and any injected context.

Persistence: None. Resets with each API call. Compaction can summarize and reduce it mid-session.

Limitation: Fixed token budget. Everything in the context window competes for space. More memory files = less room for conversation.

Layer 4: Retrieval Index (Persistent, Semi-Automatic)

Tools: memory_search, QMD (hybrid BM25 + semantic search)

A searchable index over workspace files. The agent can query this index to find relevant information from past sessions — if instructed to.

Persistence: Persistent on disk. Survives sessions.

Limitation: The agent doesn't search automatically. You must add a directive to AGENTS.md like "search memory before responding" to trigger retrieval. Without this directive, the index exists but goes unused.


How Does Compaction Work?

When the context window approaches its token limit, OpenClaw runs compaction. This process:

  1. Takes the current context window contents
  2. Generates a compressed summary of the conversation so far
  3. Replaces the full context with the summary + bootstrap files
  4. Continues the session with the compressed context

Compaction is a lossy operation. Details, nuances, and specific decisions are often lost in summarization. The agent continues the session but with a degraded understanding of what was discussed.


What Is the Memory Flush and Why Is It Disabled by Default?

The memory flush is a pre-compaction feature that triggers a silent "agentic turn" before compaction runs. During this turn, the agent saves important context to memory files before the context window is compressed.

When enabled: The agent writes key facts, decisions, and context to MEMORY.md or the memory/ folder before compaction erases the details. This is the most important native mechanism for persisting information across compaction cycles.

Why disabled by default: The flush adds an extra LLM call before every compaction, which increases latency and cost. Anthropic/OpenClaw shipped it disabled by default, likely as a cost/performance optimization.

How to enable:

memoryFlush.enabled: true
softThresholdTokens: 4000

Set softThresholdTokens to at least 4000 to give the flush enough room to work before compaction takes over.


Where Do Memory Plugins Hook In?

Memory plugins like Contexto, Mem0, and Supermemory hook into OpenClaw's event lifecycle at two key points:

`before_prompt_build` (session start): The plugin queries its own memory store and injects relevant memories into the context window before the agent's first response. This is auto-recall.

`agent_end` (session end): The plugin captures important context from the session and writes it to its own storage. This is auto-capture.

This lifecycle hook architecture means plugins operate outside the context window's token budget. They store memories externally (in SQLite, cloud, or vector databases) and inject only the relevant subset at session start. This avoids the core limitation of native memory — competing for context window space.


What Does the Full Memory Flow Look Like?

Here's the complete sequence for a session with a memory plugin installed:

Session Start
  → Load bootstrap files (SOUL.md, AGENTS.md, USER.md)
  → Plugin: query external memory store (before_prompt_build)
  → Plugin: inject relevant memories into context
  → Build initial prompt with bootstrap + memories + user message
  → Generate response
  → [conversation continues]
  → Context window approaches limit
  → Memory flush fires (if enabled) — saves context to memory files
  → Compaction runs — compresses context window
  → [conversation continues with compressed context]
  → Session end
  → Plugin: capture important context (agent_end)
  → Plugin: write to external memory store
  → Session closed

Without a plugin, the before_prompt_build and agent_end hooks are empty. No external memories are injected. No context is captured at session end. The next session starts cold.


Frequently Asked Questions

What happens to my conversation when the context window fills up?

OpenClaw runs compaction — a process that summarizes the conversation so far and replaces the full context with a compressed version. This is lossy. Details, decisions, and nuances are often lost in the summarization.

Why doesn't my agent remember what I told it yesterday?

Because OpenClaw sessions are stateless by default. Each new session starts with a fresh context window. Unless you've enabled the memory flush, added a retrieve-before-act directive, or installed a memory plugin, the agent has no mechanism to recall past sessions.

How do I check if the memory flush is enabled?

Look in your OpenClaw configuration for memoryFlush.enabled. If it's not present or set to false, the flush is off. Enable it and set softThresholdTokens: 4000.

Can I see what's in my agent's context window?

Enable debug logging in your OpenClaw config to see the full prompt being sent to the LLM at each turn. This shows exactly what's in the context window, including bootstrap files, injected memories (if using a plugin), and conversation history.

How much context window space do bootstrap files consume?

It depends on their length. A typical SOUL.md + AGENTS.md + USER.md setup consumes 1,000–3,000 tokens. Adding MEMORY.md can push this to 5,000+ tokens. With a 200K token context window, this is a small percentage — but during compaction, every token counts.

Where do plugins store memories if not in the context window?

Each plugin uses its own storage: Contexto uses local SQLite, Mem0 uses cloud servers (or self-hosted vector DB), Supermemory uses cloud servers. At session start, the plugin injects only the relevant memories from its store — not the entire database. This avoids the token competition problem of native memory.


Built by [Ekai Labs](https://ekailabs.xyz). Questions: [Discord](https://discord.com/invite/5VsUUEfbJk) · om@ekailabs.xyz · [getcontexto.com](https://getcontexto.com)

Install Contexto: openclaw plugins install @ekai/contexto

Related: [Contexto Docs](/docs) · [What Is OpenClaw Memory?](/blog/what-is-openclaw-memory) · [The Cold Start Problem](/blog/cold-start-problem-ai-agents)