Pi Agent Runtime

One-line summary

The embedded Pi Agent runtime is the execution engine that orchestrates every LLM call — it assembles prompts, manages auth profiles, handles retries, triggers compaction, and tracks token usage.

Responsibilities

  • Execute the LLM call loop: build system prompt → send to model → process response → handle tool calls → repeat
  • Manage multi-profile authentication with automatic rotation on failures (rate limits, auth errors, billing issues)
  • Detect and recover from context overflow via auto-compaction and tool result truncation
  • Track token usage across retries with accurate context-size reporting (avoiding accumulated cache inflation)
  • Coordinate thinking level resolution and fallback when models don't support requested levels

Architecture diagram

Key source files

FileRole
src/agents/pi-embedded-runner/run.ts (1159 lines)Main orchestrator: runEmbeddedPiAgent() entry point, retry loop, context overflow recovery, auth profile rotation, usage accumulation
src/agents/pi-embedded-runner/run/attempt.ts (~1400 lines)Single attempt: runEmbeddedAttempt() — session init, tool setup, system prompt build, Pi SDK call, hook execution, compaction handling
src/agents/pi-embedded-runner/system-prompt.tsThin wrapper that delegates to buildAgentSystemPrompt()
src/agents/system-prompt.ts (688 lines)System prompt construction: 20+ conditional sections, PromptMode support, context file injection
src/agents/compaction.tsSession history compaction: chunking, summarization, fallback strategies
src/agents/tool-policy-pipeline.ts (108 lines)Tool filtering: 7-step policy pipeline (profile → provider → global → agent → group)
src/agents/pi-embedded-runner/compact.tscompactEmbeddedPiSessionDirect() — explicit overflow-triggered compaction
src/agents/pi-embedded-runner/tool-result-truncation.tsLast-resort truncation of oversized tool results in session history
src/agents/pi-embedded-subscribe.tsEvent subscription: accumulates usage, tracks tool metadata, messaging results
src/agents/defaults.tsConstants: DEFAULT_CONTEXT_TOKENS = 200_000, DEFAULT_MODEL = "claude-opus-4-6"
src/agents/pi-settings.tsCompaction settings: DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR = 20_000
src/agents/usage.tsToken usage normalization across providers
src/agents/session-transcript-repair.tsTranscript repair: orphaned tool results, tool call input validation, stripToolResultDetails()

Data flow

Inbound

Auto-Reply Pipeline (src/auto-reply/get-reply-run.ts)

  ├── prompt: string (user message, scrubbed for Anthropic magic strings)
  ├── images: any[] (if vision model)
  ├── sessionFile: string (path to .jsonl transcript)
  ├── config: OpenClawConfig
  ├── skillsSnapshot: loaded skills catalog
  ├── thinkLevel: "off" | "minimal" | "low" | "medium" | "high" | "xhigh"
  ├── provider + modelId (resolved by model selection)
  └── abortSignal, timeoutMs, streaming callbacks

Internal flow (per LLM call)

1. Resolve workspace directory (agent-specific or shared)
2. Resolve auth profile (from profile store, with cooldown checks)
3. Validate context window (hard min: CONTEXT_WINDOW_HARD_MIN_TOKENS)
4. Build system prompt (buildEmbeddedSystemPrompt → buildAgentSystemPrompt)
5. Filter tools (applyToolPolicyPipeline: 7-step cascade)
6. Load session history from JSONL file
7. Send to Pi Agent SDK (agent.setSystemPrompt + session.run)
8. SDK executes tool calls in loop until assistant stops
9. Subscribe to events → accumulate usage (UsageAccumulator)
10. Return or retry based on result classification

Outbound

EmbeddedPiRunResult

  ├── payloads: Array<{ text, isError?, channel?, ... }>
  ├── meta:
  │     ├── durationMs
  │     ├── agentMeta: { sessionId, provider, model }
  │     ├── systemPromptReport (token counts)
  │     └── error?: { kind, message }
  ├── usage?: { input, output, cacheRead, cacheWrite, total }
  ├── messagingToolResults?: any[]
  └── autoCompactionCount: number

Token-critical mechanisms

1. Usage accumulation strategy (run.ts:83-179)

The UsageAccumulator tracks tokens across retries with a critical subtlety:

Problem: Each tool-call round-trip reports cacheRead ≈ current_context_size.
         Summing N round-trips gives N × context_size (inflated).

Solution: Track "last" cache fields separately from accumulated totals.
          - output: accumulated (total generated text)
          - input/cacheRead/cacheWrite: from LAST API call only
          - total: lastPromptTokens + accumulated output

This is documented in GitHub issue #13698. Without this fix, a 5-tool-call turn would report ~1M tokens instead of ~200k.

2. Context overflow recovery (run.ts:666-846)

Three-tier recovery strategy:

Tier 1: Auto-compaction (max 3 attempts)
  ├── If SDK already compacted this attempt → just retry
  └── If not → call compactEmbeddedPiSessionDirect() → retry

Tier 2: Tool result truncation (once)
  ├── Check sessionLikelyHasOversizedToolResults()
  └── If found → truncateOversizedToolResultsInSession() → retry

Tier 3: Give up
  └── Return error: "Context overflow: prompt too large for the model"

Key constraint: overflowCompactionAttempts is never reset across tiers (prevents unbounded compaction cycles, ref: OC-65).

3. Compaction internals (compaction.ts)

Constants:
  BASE_CHUNK_RATIO = 0.4        (default chunk = 40% of context)
  MIN_CHUNK_RATIO = 0.15        (minimum chunk = 15%)
  SAFETY_MARGIN = 1.2           (20% buffer for estimateTokens() inaccuracy)
  SUMMARIZATION_OVERHEAD_TOKENS = 4096

Compaction algorithm:
  1. Strip toolResult.details (SECURITY: untrusted/verbose payloads)
  2. Split messages by token share (default 2 parts)
  3. For each chunk: generate summary via LLM (retry 3x, reasoning: "high")
  4. If multiple summaries: merge with "Merge these partial summaries" instruction
  5. Fallback: if full summarization fails, exclude oversized messages (>50% context)
  6. Final fallback: "Context contained N messages, summary unavailable"

Adaptive chunk ratio:
  If average message > 10% of context window → reduce chunk ratio
  Reduction = min(avgRatio × 2, BASE - MIN)

4. Compaction settings (pi-settings.ts)

DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR = 20,000

Resolution order:
  1. agents.defaults.compaction.reserveTokens (from config)
  2. agents.defaults.compaction.reserveTokensFloor (floor guarantee)
  3. DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR (20,000)

Trigger: history_tokens > contextWindow - reserveTokens
  With 200k context and 20k reserve → compaction triggers at ~180k history

5. Tool policy pipeline (tool-policy-pipeline.ts)

7-step cascade that filters available tools before each LLM call:

Step 1: tools.profile (per-profile allowlist)
Step 2: tools.byProvider.profile (provider-specific profile)
Step 3: tools.allow (global allowlist)
Step 4: tools.byProvider.allow (provider-specific global)
Step 5: agents.<id>.tools.allow (agent-specific)
Step 6: agents.<id>.tools.byProvider.allow (agent + provider)
Step 7: group tools.allow (group-level)

Each step:
  - Strips plugin-only allowlists (warns about unknown entries)
  - Expands plugin tool groups
  - Applies filterToolsByPolicy()

Fewer tools = fewer JSON schemas in the API call = lower fixed token cost per call.

6. System prompt construction (system-prompt.ts)

Three modes control prompt size:

ModeSections includedEstimated tokens
"full"All 20+ sections~3,000-5,000
"minimal"Tooling, Safety, Skills, Workspace, Runtime + subset~1,500
"none"Single identity line~15

Key token-heavy sections in "full" mode:

  • Skills catalog (if skills loaded): ~1,000-5,000 tokens
  • Context files (SOUL.md, RULES.md): 0-5,000+ tokens, no truncation
  • Messaging section: ~200-500 tokens (detailed routing + message tool instructions)
  • Tool summaries: ~500-800 tokens (24 core tools × ~30 tokens)

How it connects to other modules

  • Depends on:

    • auto-reply/ — calls runEmbeddedPiAgent() as the execution engine
    • config/ — reads OpenClawConfig for all settings
    • skills/ — receives skills snapshot for prompt injection
    • memory/ — memory context injected via Pi SDK hooks
    • @mariozechner/pi-agent-core — the actual LLM interaction SDK
    • @mariozechner/pi-coding-agentestimateTokens(), generateSummary()
  • Depended by:

    • auto-reply/get-reply-run.ts — primary caller
    • cron/service/timer.ts — heartbeats call the same runtime
    • Any sub-agent spawn goes through this runtime

Retry logic summary

Error typeRecovery actionMax attempts
Context overflowAuto-compact → tool truncation → give up3 compactions + 1 truncation
Auth failureRotate auth profileAll profiles exhausted
Rate limitRotate auth profileAll profiles exhausted
Billing errorFormat error message, rotate or failAll profiles exhausted
Thinking unsupportedDowngrade thinking levelAll lower levels tried
TimeoutRotate auth profile (not cooldown)All profiles exhausted
Role orderingReturn user-friendly errorNo retry
Image too largeReturn user-friendly errorNo retry
All retries exhaustedReturn "request failed" errorMAX_RUN_LOOP_ITERATIONS (24-160)

Token optimization impact

This module is the central control point for token consumption:

MechanismToken impactControllable here?
System prompt size2,000-5,000/call (fixed)Yes — PromptMode selection
Tool definitions2,000-4,000/call (fixed)Yes — tool policy pipeline
Session history40,000-150,000/call (growing)Yes — compaction thresholds
Context overflow recoveryPrevents wasted full-context callsYes — retry logic
Usage reporting accuracyPrevents inflated reportingYes — UsageAccumulator

My blind spots

  • Exact flow inside runEmbeddedAttempt() for Pi SDK session lifecycle — session.run() internals are in the SDK, not OpenClaw source
  • How compactEmbeddedPiSessionDirect() in compact.ts differs from the compaction.ts functions — need to read the bridge code
  • Plugin hooks (before_agent_start, before_model_resolve) — how much they can modify the runtime behavior and token budget
  • The streamParams and how streaming affects token counting
  • Whether estimateTokens() from Pi SDK uses tiktoken or the chars/4 heuristic — accuracy matters for compaction trigger timing
  • tool-result-truncation.ts — exact truncation strategy and size thresholds
  • None yet

Change frequency

  • run.ts: High — error handling and retry logic change frequently as new edge cases are discovered (e.g., OC-65 for compaction cycles, #13698 for usage inflation)
  • system-prompt.ts: Medium — new sections added as features ship (reactions, sandbox, model aliases)
  • compaction.ts: Medium — compaction strategy evolves as models and context windows change
  • tool-policy-pipeline.ts: Low — the 7-step cascade is stable; changes are usually in policy definitions, not the pipeline itself
  • defaults.ts: Low — constants rarely change (200k context is tied to Claude model limits)