Pi Agent Runtime
One-line summary
The embedded Pi Agent runtime is the execution engine that orchestrates every LLM call — it assembles prompts, manages auth profiles, handles retries, triggers compaction, and tracks token usage.
Responsibilities
- Execute the LLM call loop: build system prompt → send to model → process response → handle tool calls → repeat
- Manage multi-profile authentication with automatic rotation on failures (rate limits, auth errors, billing issues)
- Detect and recover from context overflow via auto-compaction and tool result truncation
- Track token usage across retries with accurate context-size reporting (avoiding accumulated cache inflation)
- Coordinate thinking level resolution and fallback when models don't support requested levels
Architecture diagram
Key source files
Data flow
Inbound
Internal flow (per LLM call)
Outbound
Token-critical mechanisms
1. Usage accumulation strategy (run.ts:83-179)
The UsageAccumulator tracks tokens across retries with a critical subtlety:
This is documented in GitHub issue #13698. Without this fix, a 5-tool-call turn would report ~1M tokens instead of ~200k.
2. Context overflow recovery (run.ts:666-846)
Three-tier recovery strategy:
Key constraint: overflowCompactionAttempts is never reset across tiers (prevents unbounded compaction cycles, ref: OC-65).
3. Compaction internals (compaction.ts)
4. Compaction settings (pi-settings.ts)
5. Tool policy pipeline (tool-policy-pipeline.ts)
7-step cascade that filters available tools before each LLM call:
Fewer tools = fewer JSON schemas in the API call = lower fixed token cost per call.
6. System prompt construction (system-prompt.ts)
Three modes control prompt size:
Key token-heavy sections in "full" mode:
- Skills catalog (if skills loaded): ~1,000-5,000 tokens
- Context files (SOUL.md, RULES.md): 0-5,000+ tokens, no truncation
- Messaging section: ~200-500 tokens (detailed routing + message tool instructions)
- Tool summaries: ~500-800 tokens (24 core tools × ~30 tokens)
How it connects to other modules
-
Depends on:
auto-reply/— callsrunEmbeddedPiAgent()as the execution engineconfig/— reads OpenClawConfig for all settingsskills/— receives skills snapshot for prompt injectionmemory/— memory context injected via Pi SDK hooks@mariozechner/pi-agent-core— the actual LLM interaction SDK@mariozechner/pi-coding-agent—estimateTokens(),generateSummary()
-
Depended by:
auto-reply/get-reply-run.ts— primary callercron/service/timer.ts— heartbeats call the same runtime- Any sub-agent spawn goes through this runtime
Retry logic summary
Token optimization impact
This module is the central control point for token consumption:
My blind spots
- Exact flow inside
runEmbeddedAttempt()for Pi SDK session lifecycle —session.run()internals are in the SDK, not OpenClaw source - How
compactEmbeddedPiSessionDirect()incompact.tsdiffers from thecompaction.tsfunctions — need to read the bridge code - Plugin hooks (
before_agent_start,before_model_resolve) — how much they can modify the runtime behavior and token budget - The
streamParamsand how streaming affects token counting - Whether
estimateTokens()from Pi SDK uses tiktoken or the chars/4 heuristic — accuracy matters for compaction trigger timing -
tool-result-truncation.ts— exact truncation strategy and size thresholds
Related contributions
- None yet
Change frequency
run.ts: High — error handling and retry logic change frequently as new edge cases are discovered (e.g., OC-65 for compaction cycles, #13698 for usage inflation)system-prompt.ts: Medium — new sections added as features ship (reactions, sandbox, model aliases)compaction.ts: Medium — compaction strategy evolves as models and context windows changetool-policy-pipeline.ts: Low — the 7-step cascade is stable; changes are usually in policy definitions, not the pipeline itselfdefaults.ts: Low — constants rarely change (200k context is tied to Claude model limits)