Pi Agent Runtime

One-line summary

The embedded Pi Agent runtime is the execution engine that orchestrates every LLM call — it assembles prompts, manages auth profiles, handles retries, triggers compaction, and tracks token usage.

Responsibilities

Execute the LLM call loop: build system prompt → send to model → process response → handle tool calls → repeat
Manage multi-profile authentication with automatic rotation on failures (rate limits, auth errors, billing issues)
Detect and recover from context overflow via auto-compaction and tool result truncation
Track token usage across retries with accurate context-size reporting (avoiding accumulated cache inflation)
Coordinate thinking level resolution and fallback when models don't support requested levels

Architecture diagram

Key source files

File	Role
`src/agents/pi-embedded-runner/run.ts` (1159 lines)	Main orchestrator: `runEmbeddedPiAgent()` entry point, retry loop, context overflow recovery, auth profile rotation, usage accumulation
`src/agents/pi-embedded-runner/run/attempt.ts` (~1400 lines)	Single attempt: `runEmbeddedAttempt()` — session init, tool setup, system prompt build, Pi SDK call, hook execution, compaction handling
`src/agents/pi-embedded-runner/system-prompt.ts`	Thin wrapper that delegates to `buildAgentSystemPrompt()`
`src/agents/system-prompt.ts` (688 lines)	System prompt construction: 20+ conditional sections, `PromptMode` support, context file injection
`src/agents/compaction.ts`	Session history compaction: chunking, summarization, fallback strategies
`src/agents/tool-policy-pipeline.ts` (108 lines)	Tool filtering: 7-step policy pipeline (profile → provider → global → agent → group)
`src/agents/pi-embedded-runner/compact.ts`	`compactEmbeddedPiSessionDirect()` — explicit overflow-triggered compaction
`src/agents/pi-embedded-runner/tool-result-truncation.ts`	Last-resort truncation of oversized tool results in session history
`src/agents/pi-embedded-subscribe.ts`	Event subscription: accumulates usage, tracks tool metadata, messaging results
`src/agents/defaults.ts`	Constants: `DEFAULT_CONTEXT_TOKENS = 200_000`, `DEFAULT_MODEL = "claude-opus-4-6"`
`src/agents/pi-settings.ts`	Compaction settings: `DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR = 20_000`
`src/agents/usage.ts`	Token usage normalization across providers
`src/agents/session-transcript-repair.ts`	Transcript repair: orphaned tool results, tool call input validation, `stripToolResultDetails()`

Data flow

Inbound

Auto-Reply Pipeline (src/auto-reply/get-reply-run.ts)
  │
  ├── prompt: string (user message, scrubbed for Anthropic magic strings)
  ├── images: any[] (if vision model)
  ├── sessionFile: string (path to .jsonl transcript)
  ├── config: OpenClawConfig
  ├── skillsSnapshot: loaded skills catalog
  ├── thinkLevel: "off" | "minimal" | "low" | "medium" | "high" | "xhigh"
  ├── provider + modelId (resolved by model selection)
  └── abortSignal, timeoutMs, streaming callbacks

Internal flow (per LLM call)

1. Resolve workspace directory (agent-specific or shared)
2. Resolve auth profile (from profile store, with cooldown checks)
3. Validate context window (hard min: CONTEXT_WINDOW_HARD_MIN_TOKENS)
4. Build system prompt (buildEmbeddedSystemPrompt → buildAgentSystemPrompt)
5. Filter tools (applyToolPolicyPipeline: 7-step cascade)
6. Load session history from JSONL file
7. Send to Pi Agent SDK (agent.setSystemPrompt + session.run)
8. SDK executes tool calls in loop until assistant stops
9. Subscribe to events → accumulate usage (UsageAccumulator)
10. Return or retry based on result classification

Outbound

EmbeddedPiRunResult
  │
  ├── payloads: Array<{ text, isError?, channel?, ... }>
  ├── meta:
  │     ├── durationMs
  │     ├── agentMeta: { sessionId, provider, model }
  │     ├── systemPromptReport (token counts)
  │     └── error?: { kind, message }
  ├── usage?: { input, output, cacheRead, cacheWrite, total }
  ├── messagingToolResults?: any[]
  └── autoCompactionCount: number

Token-critical mechanisms

1. Usage accumulation strategy (run.ts:83-179)

The UsageAccumulator tracks tokens across retries with a critical subtlety:

Problem: Each tool-call round-trip reports cacheRead ≈ current_context_size.
         Summing N round-trips gives N × context_size (inflated).

Solution: Track "last" cache fields separately from accumulated totals.
          - output: accumulated (total generated text)
          - input/cacheRead/cacheWrite: from LAST API call only
          - total: lastPromptTokens + accumulated output

This is documented in GitHub issue #13698. Without this fix, a 5-tool-call turn would report ~1M tokens instead of ~200k.

2. Context overflow recovery (run.ts:666-846)

Three-tier recovery strategy:

Tier 1: Auto-compaction (max 3 attempts)
  ├── If SDK already compacted this attempt → just retry
  └── If not → call compactEmbeddedPiSessionDirect() → retry

Tier 2: Tool result truncation (once)
  ├── Check sessionLikelyHasOversizedToolResults()
  └── If found → truncateOversizedToolResultsInSession() → retry

Tier 3: Give up
  └── Return error: "Context overflow: prompt too large for the model"

Key constraint: overflowCompactionAttempts is never reset across tiers (prevents unbounded compaction cycles, ref: OC-65).

3. Compaction internals (compaction.ts)

Constants:
  BASE_CHUNK_RATIO = 0.4        (default chunk = 40% of context)
  MIN_CHUNK_RATIO = 0.15        (minimum chunk = 15%)
  SAFETY_MARGIN = 1.2           (20% buffer for estimateTokens() inaccuracy)
  SUMMARIZATION_OVERHEAD_TOKENS = 4096

Compaction algorithm:
  1. Strip toolResult.details (SECURITY: untrusted/verbose payloads)
  2. Split messages by token share (default 2 parts)
  3. For each chunk: generate summary via LLM (retry 3x, reasoning: "high")
  4. If multiple summaries: merge with "Merge these partial summaries" instruction
  5. Fallback: if full summarization fails, exclude oversized messages (>50% context)
  6. Final fallback: "Context contained N messages, summary unavailable"

Adaptive chunk ratio:
  If average message > 10% of context window → reduce chunk ratio
  Reduction = min(avgRatio × 2, BASE - MIN)

4. Compaction settings (pi-settings.ts)

DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR = 20,000

Resolution order:
  1. agents.defaults.compaction.reserveTokens (from config)
  2. agents.defaults.compaction.reserveTokensFloor (floor guarantee)
  3. DEFAULT_PI_COMPACTION_RESERVE_TOKENS_FLOOR (20,000)

Trigger: history_tokens > contextWindow - reserveTokens
  With 200k context and 20k reserve → compaction triggers at ~180k history

5. Tool policy pipeline (tool-policy-pipeline.ts)

7-step cascade that filters available tools before each LLM call:

Step 1: tools.profile (per-profile allowlist)
Step 2: tools.byProvider.profile (provider-specific profile)
Step 3: tools.allow (global allowlist)
Step 4: tools.byProvider.allow (provider-specific global)
Step 5: agents.<id>.tools.allow (agent-specific)
Step 6: agents.<id>.tools.byProvider.allow (agent + provider)
Step 7: group tools.allow (group-level)

Each step:
  - Strips plugin-only allowlists (warns about unknown entries)
  - Expands plugin tool groups
  - Applies filterToolsByPolicy()

Fewer tools = fewer JSON schemas in the API call = lower fixed token cost per call.

6. System prompt construction (system-prompt.ts)

Three modes control prompt size:

Mode	Sections included	Estimated tokens
`"full"`	All 20+ sections	~3,000-5,000
`"minimal"`	Tooling, Safety, Skills, Workspace, Runtime + subset	~1,500
`"none"`	Single identity line	~15

Key token-heavy sections in "full" mode:

Skills catalog (if skills loaded): ~1,000-5,000 tokens
Context files (SOUL.md, RULES.md): 0-5,000+ tokens, no truncation
Messaging section: ~200-500 tokens (detailed routing + message tool instructions)
Tool summaries: ~500-800 tokens (24 core tools × ~30 tokens)

How it connects to other modules

Depends on:
- auto-reply/ — calls runEmbeddedPiAgent() as the execution engine
- config/ — reads OpenClawConfig for all settings
- skills/ — receives skills snapshot for prompt injection
- memory/ — memory context injected via Pi SDK hooks
- @mariozechner/pi-agent-core — the actual LLM interaction SDK
- @mariozechner/pi-coding-agent — estimateTokens(), generateSummary()
Depended by:
- auto-reply/get-reply-run.ts — primary caller
- cron/service/timer.ts — heartbeats call the same runtime
- Any sub-agent spawn goes through this runtime

Retry logic summary

Error type	Recovery action	Max attempts
Context overflow	Auto-compact → tool truncation → give up	3 compactions + 1 truncation
Auth failure	Rotate auth profile	All profiles exhausted
Rate limit	Rotate auth profile	All profiles exhausted
Billing error	Format error message, rotate or fail	All profiles exhausted
Thinking unsupported	Downgrade thinking level	All lower levels tried
Timeout	Rotate auth profile (not cooldown)	All profiles exhausted
Role ordering	Return user-friendly error	No retry
Image too large	Return user-friendly error	No retry
All retries exhausted	Return "request failed" error	`MAX_RUN_LOOP_ITERATIONS` (24-160)

Token optimization impact

This module is the central control point for token consumption:

Mechanism	Token impact	Controllable here?
System prompt size	2,000-5,000/call (fixed)	Yes — `PromptMode` selection
Tool definitions	2,000-4,000/call (fixed)	Yes — tool policy pipeline
Session history	40,000-150,000/call (growing)	Yes — compaction thresholds
Context overflow recovery	Prevents wasted full-context calls	Yes — retry logic
Usage reporting accuracy	Prevents inflated reporting	Yes — UsageAccumulator

Exact flow inside runEmbeddedAttempt() for Pi SDK session lifecycle — session.run() internals are in the SDK, not OpenClaw source
How compactEmbeddedPiSessionDirect() in compact.ts differs from the compaction.ts functions — need to read the bridge code
Plugin hooks (before_agent_start, before_model_resolve) — how much they can modify the runtime behavior and token budget
The streamParams and how streaming affects token counting
Whether estimateTokens() from Pi SDK uses tiktoken or the chars/4 heuristic — accuracy matters for compaction trigger timing
tool-result-truncation.ts — exact truncation strategy and size thresholds

None yet

Change frequency

run.ts: High — error handling and retry logic change frequently as new edge cases are discovered (e.g., OC-65 for compaction cycles, #13698 for usage inflation)
system-prompt.ts: Medium — new sections added as features ship (reactions, sandbox, model aliases)
compaction.ts: Medium — compaction strategy evolves as models and context windows change
tool-policy-pipeline.ts: Low — the 7-step cascade is stable; changes are usually in policy definitions, not the pipeline itself
defaults.ts: Low — constants rarely change (200k context is tied to Claude model limits)

#Pi Agent Runtime

#One-line summary

#Responsibilities

#Architecture diagram

#Key source files

#Data flow

#Inbound

#Internal flow (per LLM call)

#Outbound

#Token-critical mechanisms

#1. Usage accumulation strategy (run.ts:83-179)

#2. Context overflow recovery (run.ts:666-846)

#3. Compaction internals (compaction.ts)

#4. Compaction settings (pi-settings.ts)

#5. Tool policy pipeline (tool-policy-pipeline.ts)

#6. System prompt construction (system-prompt.ts)

#How it connects to other modules

#Retry logic summary

#Token optimization impact

#My blind spots

#Related contributions

#Change frequency