Token Usage Analysis
Why does OpenClaw consume so many tokens, and where are the optimization opportunities?
Token budget per LLM call
Each LLM call sends:
Default context window: 200,000 tokens (src/agents/defaults.ts:6 — DEFAULT_CONTEXT_TOKENS = 200_000).
Token consumers ranked by impact
1. Session history (largest and growing)
Source: ~/.openclaw/agents/<agentId>/sessions/<sessionId>.jsonl
Each turn (user message + assistant response + tool calls + tool results) is appended to the JSONL transcript. On the next turn, the entire history is loaded and sent to the LLM.
In a typical conversation:
- Each tool call round-trip adds ~500-2000 tokens (tool input + output)
- An agent might make 5-10 tool calls per user message
- After 10 user messages with tool use, history can easily reach 50,000-100,000 tokens
Compaction trigger (src/agents/compaction.ts):
- When history exceeds
contextWindow - reserveTokensFloor, compaction runs - Default reserve: configurable via
agents.defaults.compaction.reserveTokensFloor - Compaction splits history into chunks (default 2 parts), summarizes each, merges summaries
- Summarization overhead: 4,096 tokens reserved (
SUMMARIZATION_OVERHEAD_TOKENS) - Uses a 1.2x safety margin because
estimateTokens()uses a chars/4 heuristic that underestimates
Key insight: Before compaction triggers, every turn pays the full history cost. A 20-turn conversation with tool use can easily burn 100k+ input tokens per call.
2. System prompt (~2,000-5,000 tokens, fixed per call)
Source: src/agents/system-prompt.ts — buildAgentSystemPrompt()
The system prompt includes these sections (in order):
Total system prompt: ~2,000 tokens (minimal) to ~5,000+ tokens (full with skills + context files).
Subagent optimization (PromptMode):
"full": All sections (~5,000 tokens)"minimal": Reduced sections — skips Memory, Messaging, Voice, Reply Tags, Silent Replies, Heartbeats (~1,500 tokens)"none": Just identity line (~15 tokens)
3. Skills prompt (~1,000-5,000 tokens, fixed per call)
Source: src/agents/skills/workspace.ts
Skills are loaded from three sources:
- Bundled skills (
skills/) — 52 skills - Workspace skills (
~/.openclaw/agents/<id>/workspace/skills/) - Plugin skills (from extensions)
Hard limits (workspace.ts:95-98):
The skills prompt does NOT include full SKILL.md content. It includes a catalog: name + description + file location for each skill. The agent reads the full SKILL.md only when a skill is selected.
Optimization already present: compactSkillPaths() replaces home directory paths with ~, saving ~400-600 tokens total (comment in source: "Saves ~5-6 tokens per skill path × N skills").
But: With 52+ skills at ~20-50 tokens each (name + description + path), the catalog alone is ~1,500-3,000 tokens.
4. Tool definitions (~2,000-4,000 tokens, fixed per call)
Source: Tool JSON schemas sent alongside each LLM call
The Pi Agent SDK sends tool definitions as JSON schema objects. Each tool includes:
- Name
- Description
- Parameter schema (properties, types, descriptions, required fields)
With 24 core tools + plugin tools + skill tools, typical total is ~2,000-4,000 tokens of JSON schema.
Not yet confirmed: Whether there's deduplication between tool summaries in system prompt and tool definitions in the API call. This could be a redundancy.
5. Memory/RAG context (~500-2,000 tokens per turn)
Source: src/memory/manager.ts
Memory search results are injected into the conversation. The hybrid search (BM25 + vector + MMR re-ranking) returns top matches from the SQLite database.
Not yet confirmed: Exact limits on how many chunks are injected and maximum token count per injection. Need to read src/memory/manager-search.ts more deeply.
6. Context files (~0-5,000+ tokens, fixed per call)
Source: buildAgentSystemPrompt() line 588-608
User-editable files (like SOUL.md, RULES.md, workspace docs) are loaded and injected verbatim into the system prompt under "Project Context". There is no truncation — the full file content is included.
A large SOUL.md or workspace doc can easily add 2,000-5,000 tokens per call.
7. Heartbeat overhead (~full context per heartbeat)
Source: src/cron/service/timer.ts
Default: every 600 seconds (10 minutes) when cron.heartbeat.enabled: true.
Each heartbeat sends the full system prompt + session history for a single LLM call, just to get "HEARTBEAT_OK" back. If the session has a long history, this is expensive.
Cost: At 200k context × $3/M input tokens (Claude) × 6 calls/hour × 24h = ~$86/day if the context is full. In practice, sessions are shorter, but heartbeat cost scales with session size.
Summary: where tokens go
After 30 turns with heavy tool use: easily 100,000-150,000 tokens per call.
Optimization opportunities
High impact
Medium impact
Low impact (already optimized)
Key files for contributors
My blind spots
- Exact token count for tool JSON schemas — need to serialize and measure
- Memory injection limits (
maxChunks, token cap) — need deeper read ofmanager-search.ts - Whether prompt caching (Anthropic) is being used — this would dramatically reduce effective cost of fixed system prompt
- How
estimateTokens()in Pi Agent SDK works — chars/4 heuristic mentioned but accuracy unknown - Whether there's any batching or caching of repeated tool calls across turns