Token Usage Analysis

Why does OpenClaw consume so many tokens, and where are the optimization opportunities?

Token budget per LLM call

Each LLM call sends:

Input tokens = System prompt
             + Tool definitions (JSON schema for every enabled tool)
             + Session history (all previous turns, or compacted summary + recent turns)
             + Memory/RAG context (injected search results)
             + Current user message

Default context window: 200,000 tokens (src/agents/defaults.ts:6DEFAULT_CONTEXT_TOKENS = 200_000).

Token consumers ranked by impact

1. Session history (largest and growing)

Source: ~/.openclaw/agents/<agentId>/sessions/<sessionId>.jsonl

Each turn (user message + assistant response + tool calls + tool results) is appended to the JSONL transcript. On the next turn, the entire history is loaded and sent to the LLM.

In a typical conversation:

  • Each tool call round-trip adds ~500-2000 tokens (tool input + output)
  • An agent might make 5-10 tool calls per user message
  • After 10 user messages with tool use, history can easily reach 50,000-100,000 tokens

Compaction trigger (src/agents/compaction.ts):

  • When history exceeds contextWindow - reserveTokensFloor, compaction runs
  • Default reserve: configurable via agents.defaults.compaction.reserveTokensFloor
  • Compaction splits history into chunks (default 2 parts), summarizes each, merges summaries
  • Summarization overhead: 4,096 tokens reserved (SUMMARIZATION_OVERHEAD_TOKENS)
  • Uses a 1.2x safety margin because estimateTokens() uses a chars/4 heuristic that underestimates

Key insight: Before compaction triggers, every turn pays the full history cost. A 20-turn conversation with tool use can easily burn 100k+ input tokens per call.

2. System prompt (~2,000-5,000 tokens, fixed per call)

Source: src/agents/system-prompt.tsbuildAgentSystemPrompt()

The system prompt includes these sections (in order):

SectionConditionEstimated tokens
Identity lineAlways~15
Tooling (tool summaries)Always~500-800 (24 core tools × ~30 tokens each)
Tool Call StyleAlways~80
SafetyAlways~100
CLI Quick ReferenceAlways~80
SkillsIf skills loaded~1,000-5,000 (see below)
Memory RecallIf memory tools available~60
Self-UpdateIf gateway tool available~80
Model AliasesIf aliases configured~100-300
WorkspaceAlways~40
DocumentationIf docs path set~60
SandboxIf sandboxed~200
Authorized SendersIf owner configured~30
TimeIf timezone set~20
Workspace Files (injected)Always (header)~20
Reply TagsFull mode only~80
MessagingFull mode only~200
Voice (TTS)If TTS configured~30
Context Files (SOUL.md etc.)If context files exist~500-5,000 (depends on file size)
Silent RepliesFull mode only~100
HeartbeatsFull mode only~60
RuntimeAlways~40

Total system prompt: ~2,000 tokens (minimal) to ~5,000+ tokens (full with skills + context files).

Subagent optimization (PromptMode):

  • "full": All sections (~5,000 tokens)
  • "minimal": Reduced sections — skips Memory, Messaging, Voice, Reply Tags, Silent Replies, Heartbeats (~1,500 tokens)
  • "none": Just identity line (~15 tokens)

3. Skills prompt (~1,000-5,000 tokens, fixed per call)

Source: src/agents/skills/workspace.ts

Skills are loaded from three sources:

  1. Bundled skills (skills/) — 52 skills
  2. Workspace skills (~/.openclaw/agents/<id>/workspace/skills/)
  3. Plugin skills (from extensions)

Hard limits (workspace.ts:95-98):

DEFAULT_MAX_CANDIDATES_PER_ROOT = 300
DEFAULT_MAX_SKILLS_LOADED_PER_SOURCE = 200
DEFAULT_MAX_SKILLS_IN_PROMPT = 150
DEFAULT_MAX_SKILLS_PROMPT_CHARS = 30,000  ← ~7,500 tokens
DEFAULT_MAX_SKILL_FILE_BYTES = 256,000

The skills prompt does NOT include full SKILL.md content. It includes a catalog: name + description + file location for each skill. The agent reads the full SKILL.md only when a skill is selected.

Optimization already present: compactSkillPaths() replaces home directory paths with ~, saving ~400-600 tokens total (comment in source: "Saves ~5-6 tokens per skill path × N skills").

But: With 52+ skills at ~20-50 tokens each (name + description + path), the catalog alone is ~1,500-3,000 tokens.

4. Tool definitions (~2,000-4,000 tokens, fixed per call)

Source: Tool JSON schemas sent alongside each LLM call

The Pi Agent SDK sends tool definitions as JSON schema objects. Each tool includes:

  • Name
  • Description
  • Parameter schema (properties, types, descriptions, required fields)

With 24 core tools + plugin tools + skill tools, typical total is ~2,000-4,000 tokens of JSON schema.

Not yet confirmed: Whether there's deduplication between tool summaries in system prompt and tool definitions in the API call. This could be a redundancy.

5. Memory/RAG context (~500-2,000 tokens per turn)

Source: src/memory/manager.ts

Memory search results are injected into the conversation. The hybrid search (BM25 + vector + MMR re-ranking) returns top matches from the SQLite database.

Not yet confirmed: Exact limits on how many chunks are injected and maximum token count per injection. Need to read src/memory/manager-search.ts more deeply.

6. Context files (~0-5,000+ tokens, fixed per call)

Source: buildAgentSystemPrompt() line 588-608

User-editable files (like SOUL.md, RULES.md, workspace docs) are loaded and injected verbatim into the system prompt under "Project Context". There is no truncation — the full file content is included.

A large SOUL.md or workspace doc can easily add 2,000-5,000 tokens per call.

7. Heartbeat overhead (~full context per heartbeat)

Source: src/cron/service/timer.ts

Default: every 600 seconds (10 minutes) when cron.heartbeat.enabled: true.

Each heartbeat sends the full system prompt + session history for a single LLM call, just to get "HEARTBEAT_OK" back. If the session has a long history, this is expensive.

Cost: At 200k context × $3/M input tokens (Claude) × 6 calls/hour × 24h = ~$86/day if the context is full. In practice, sessions are shorter, but heartbeat cost scales with session size.

Summary: where tokens go

Typical single turn (mid-conversation, 15 turns in):

System prompt:         ~3,000 tokens  (fixed)
Tool definitions:      ~3,000 tokens  (fixed)
Skills catalog:        ~2,000 tokens  (fixed)
Context files:         ~1,000 tokens  (fixed, depends on SOUL.md)
Session history:      ~40,000 tokens  (growing)
Memory/RAG context:    ~1,000 tokens  (per turn)
Current message:         ~100 tokens
                      ─────────────
Total input:          ~50,000 tokens  per LLM call

After 30 turns with heavy tool use: easily 100,000-150,000 tokens per call.

Optimization opportunities

High impact

OpportunityEstimated savingsWhere to look
Earlier/more aggressive compaction30-50% of history tokenssrc/agents/compaction.ts — lower the threshold or compact more frequently
Sliding window instead of full history40-60% of history tokensOnly send last N turns + summary, not everything since last compaction
Heartbeat context reduction90% of heartbeat costSend minimal context with heartbeats instead of full session
Tool result truncation10-30% of history tokensTool results (especially file reads, web fetches) can be huge. Truncate old tool results in history

Medium impact

OpportunityEstimated savingsWhere to look
Lazy skill catalog~1,500-2,500 tokens/callOnly include skills relevant to current context instead of full catalog
Context file summarization~1,000-3,000 tokens/callSummarize large SOUL.md / docs instead of verbatim inclusion
Tool schema pruning~500-1,000 tokens/callOnly send tool schemas for tools likely needed this turn
Dedup tool summaries vs schemas~500 tokens/callSystem prompt already summarizes tools; JSON schema descriptions may be redundant

Low impact (already optimized)

Already doneSource
Path compaction (~ substitution)compactSkillPaths() in workspace.ts
Subagent prompt reductionPromptMode = "minimal" skips many sections
Tool result detail stripping in compactionstripToolResultDetails() in compaction

Key files for contributors

FileWhat it controls
src/agents/system-prompt.tsSystem prompt construction
src/agents/skills/workspace.tsSkills catalog injection + limits
src/agents/compaction.tsSession compaction logic + thresholds
src/agents/defaults.tsDEFAULT_CONTEXT_TOKENS = 200_000
src/agents/pi-settings.tsreserveTokensFloor resolution
src/agents/tool-policy-pipeline.tsTool filtering before LLM call
src/memory/manager.tsMemory context injection
src/cron/service/timer.tsHeartbeat frequency and payload

My blind spots

  • Exact token count for tool JSON schemas — need to serialize and measure
  • Memory injection limits (maxChunks, token cap) — need deeper read of manager-search.ts
  • Whether prompt caching (Anthropic) is being used — this would dramatically reduce effective cost of fixed system prompt
  • How estimateTokens() in Pi Agent SDK works — chars/4 heuristic mentioned but accuracy unknown
  • Whether there's any batching or caching of repeated tool calls across turns