Token Usage Analysis

Why does OpenClaw consume so many tokens, and where are the optimization opportunities?

Token budget per LLM call

Each LLM call sends:

Input tokens = System prompt
             + Tool definitions (JSON schema for every enabled tool)
             + Session history (all previous turns, or compacted summary + recent turns)
             + Memory/RAG context (injected search results)
             + Current user message

Default context window: 200,000 tokens (src/agents/defaults.ts:6 — DEFAULT_CONTEXT_TOKENS = 200_000).

Token consumers ranked by impact

1. Session history (largest and growing)

Source: ~/.openclaw/agents/<agentId>/sessions/<sessionId>.jsonl

Each turn (user message + assistant response + tool calls + tool results) is appended to the JSONL transcript. On the next turn, the entire history is loaded and sent to the LLM.

In a typical conversation:

Each tool call round-trip adds ~500-2000 tokens (tool input + output)
An agent might make 5-10 tool calls per user message
After 10 user messages with tool use, history can easily reach 50,000-100,000 tokens

Compaction trigger (src/agents/compaction.ts):

When history exceeds contextWindow - reserveTokensFloor, compaction runs
Default reserve: configurable via agents.defaults.compaction.reserveTokensFloor
Compaction splits history into chunks (default 2 parts), summarizes each, merges summaries
Summarization overhead: 4,096 tokens reserved (SUMMARIZATION_OVERHEAD_TOKENS)
Uses a 1.2x safety margin because estimateTokens() uses a chars/4 heuristic that underestimates

Key insight: Before compaction triggers, every turn pays the full history cost. A 20-turn conversation with tool use can easily burn 100k+ input tokens per call.

2. System prompt (~2,000-5,000 tokens, fixed per call)

Source: src/agents/system-prompt.ts — buildAgentSystemPrompt()

The system prompt includes these sections (in order):

Section	Condition	Estimated tokens
Identity line	Always	~15
Tooling (tool summaries)	Always	~500-800 (24 core tools × ~30 tokens each)
Tool Call Style	Always	~80
Safety	Always	~100
CLI Quick Reference	Always	~80
Skills	If skills loaded	~1,000-5,000 (see below)
Memory Recall	If memory tools available	~60
Self-Update	If gateway tool available	~80
Model Aliases	If aliases configured	~100-300
Workspace	Always	~40
Documentation	If docs path set	~60
Sandbox	If sandboxed	~200
Authorized Senders	If owner configured	~30
Time	If timezone set	~20
Workspace Files (injected)	Always (header)	~20
Reply Tags	Full mode only	~80
Messaging	Full mode only	~200
Voice (TTS)	If TTS configured	~30
Context Files (SOUL.md etc.)	If context files exist	~500-5,000 (depends on file size)
Silent Replies	Full mode only	~100
Heartbeats	Full mode only	~60
Runtime	Always	~40

Total system prompt: ~2,000 tokens (minimal) to ~5,000+ tokens (full with skills + context files).

Subagent optimization (PromptMode):

"full": All sections (~5,000 tokens)
"minimal": Reduced sections — skips Memory, Messaging, Voice, Reply Tags, Silent Replies, Heartbeats (~1,500 tokens)
"none": Just identity line (~15 tokens)

3. Skills prompt (~1,000-5,000 tokens, fixed per call)

Source: src/agents/skills/workspace.ts

Skills are loaded from three sources:

Bundled skills (skills/) — 52 skills
Workspace skills (~/.openclaw/agents/<id>/workspace/skills/)
Plugin skills (from extensions)

Hard limits (workspace.ts:95-98):

DEFAULT_MAX_CANDIDATES_PER_ROOT = 300
DEFAULT_MAX_SKILLS_LOADED_PER_SOURCE = 200
DEFAULT_MAX_SKILLS_IN_PROMPT = 150
DEFAULT_MAX_SKILLS_PROMPT_CHARS = 30,000  ← ~7,500 tokens
DEFAULT_MAX_SKILL_FILE_BYTES = 256,000

The skills prompt does NOT include full SKILL.md content. It includes a catalog: name + description + file location for each skill. The agent reads the full SKILL.md only when a skill is selected.

Optimization already present: compactSkillPaths() replaces home directory paths with ~, saving ~400-600 tokens total (comment in source: "Saves ~5-6 tokens per skill path × N skills").

But: With 52+ skills at ~20-50 tokens each (name + description + path), the catalog alone is ~1,500-3,000 tokens.

4. Tool definitions (~2,000-4,000 tokens, fixed per call)

Source: Tool JSON schemas sent alongside each LLM call

The Pi Agent SDK sends tool definitions as JSON schema objects. Each tool includes:

Name
Description
Parameter schema (properties, types, descriptions, required fields)

With 24 core tools + plugin tools + skill tools, typical total is ~2,000-4,000 tokens of JSON schema.

Not yet confirmed: Whether there's deduplication between tool summaries in system prompt and tool definitions in the API call. This could be a redundancy.

5. Memory/RAG context (~500-2,000 tokens per turn)

Source: src/memory/manager.ts

Memory search results are injected into the conversation. The hybrid search (BM25 + vector + MMR re-ranking) returns top matches from the SQLite database.

Not yet confirmed: Exact limits on how many chunks are injected and maximum token count per injection. Need to read src/memory/manager-search.ts more deeply.

6. Context files (~0-5,000+ tokens, fixed per call)

Source: buildAgentSystemPrompt() line 588-608

User-editable files (like SOUL.md, RULES.md, workspace docs) are loaded and injected verbatim into the system prompt under "Project Context". There is no truncation — the full file content is included.

A large SOUL.md or workspace doc can easily add 2,000-5,000 tokens per call.

7. Heartbeat overhead (~full context per heartbeat)

Source: src/cron/service/timer.ts

Default: every 600 seconds (10 minutes) when cron.heartbeat.enabled: true.

Each heartbeat sends the full system prompt + session history for a single LLM call, just to get "HEARTBEAT_OK" back. If the session has a long history, this is expensive.

Cost: At 200k context × $3/M input tokens (Claude) × 6 calls/hour × 24h = ~$86/day if the context is full. In practice, sessions are shorter, but heartbeat cost scales with session size.

Summary: where tokens go

Typical single turn (mid-conversation, 15 turns in):

System prompt:         ~3,000 tokens  (fixed)
Tool definitions:      ~3,000 tokens  (fixed)
Skills catalog:        ~2,000 tokens  (fixed)
Context files:         ~1,000 tokens  (fixed, depends on SOUL.md)
Session history:      ~40,000 tokens  (growing)
Memory/RAG context:    ~1,000 tokens  (per turn)
Current message:         ~100 tokens
                      ─────────────
Total input:          ~50,000 tokens  per LLM call

After 30 turns with heavy tool use: easily 100,000-150,000 tokens per call.

Optimization opportunities

High impact

Opportunity	Estimated savings	Where to look
Earlier/more aggressive compaction	30-50% of history tokens	`src/agents/compaction.ts` — lower the threshold or compact more frequently
Sliding window instead of full history	40-60% of history tokens	Only send last N turns + summary, not everything since last compaction
Heartbeat context reduction	90% of heartbeat cost	Send minimal context with heartbeats instead of full session
Tool result truncation	10-30% of history tokens	Tool results (especially file reads, web fetches) can be huge. Truncate old tool results in history

Medium impact

Opportunity	Estimated savings	Where to look
Lazy skill catalog	~1,500-2,500 tokens/call	Only include skills relevant to current context instead of full catalog
Context file summarization	~1,000-3,000 tokens/call	Summarize large SOUL.md / docs instead of verbatim inclusion
Tool schema pruning	~500-1,000 tokens/call	Only send tool schemas for tools likely needed this turn
Dedup tool summaries vs schemas	~500 tokens/call	System prompt already summarizes tools; JSON schema descriptions may be redundant

Low impact (already optimized)

Already done	Source
Path compaction (`~` substitution)	`compactSkillPaths()` in `workspace.ts`
Subagent prompt reduction	`PromptMode = "minimal"` skips many sections
Tool result detail stripping in compaction	`stripToolResultDetails()` in compaction

Key files for contributors

File	What it controls
`src/agents/system-prompt.ts`	System prompt construction
`src/agents/skills/workspace.ts`	Skills catalog injection + limits
`src/agents/compaction.ts`	Session compaction logic + thresholds
`src/agents/defaults.ts`	`DEFAULT_CONTEXT_TOKENS = 200_000`
`src/agents/pi-settings.ts`	`reserveTokensFloor` resolution
`src/agents/tool-policy-pipeline.ts`	Tool filtering before LLM call
`src/memory/manager.ts`	Memory context injection
`src/cron/service/timer.ts`	Heartbeat frequency and payload

Exact token count for tool JSON schemas — need to serialize and measure
Memory injection limits (maxChunks, token cap) — need deeper read of manager-search.ts
Whether prompt caching (Anthropic) is being used — this would dramatically reduce effective cost of fixed system prompt
How estimateTokens() in Pi Agent SDK works — chars/4 heuristic mentioned but accuracy unknown
Whether there's any batching or caching of repeated tool calls across turns

#Token Usage Analysis

#Token budget per LLM call

#Token consumers ranked by impact

#1. Session history (largest and growing)

#2. System prompt (~2,000-5,000 tokens, fixed per call)

#3. Skills prompt (~1,000-5,000 tokens, fixed per call)

#4. Tool definitions (~2,000-4,000 tokens, fixed per call)

#5. Memory/RAG context (~500-2,000 tokens per turn)

#6. Context files (~0-5,000+ tokens, fixed per call)

#7. Heartbeat overhead (~full context per heartbeat)

#Summary: where tokens go

#Optimization opportunities

#High impact

#Medium impact

#Low impact (already optimized)

#Key files for contributors

#My blind spots