Auto-Reply Pipeline
One-line summary
The auto-reply pipeline is the decision engine between an inbound message and an LLM call — it parses commands, resolves directives, gates agent execution, manages streaming, and delivers replies.
Responsibilities
- Dispatch inbound messages through a multi-stage decision pipeline
- Parse slash commands (
/help,/new,/model,/think,/status, etc.) - Resolve directives (thinking level, verbose, reasoning, elevated) with session persistence
- Decide whether to trigger an LLM call or handle the message locally
- Orchestrate agent execution with model fallback chains
- Manage streaming (partial replies, block replies, tool results, reasoning)
- Handle cross-channel routing (e.g., Telegram message processed by Slack session)
- Apply TTS to outbound payloads when configured
Architecture diagram
Key source files
Data flow
Inbound
Skip conditions (no LLM call)
The pipeline skips the LLM call when:
- Message is empty after normalization (and no media)
- Unauthorized control command detected
- Directive-only message (e.g., just
/think:high) - Inline command handled directly (e.g.,
/status) - Fast abort triggered (stop command)
- Duplicate message detected
Outbound
Token optimization impact
Thinking level hierarchy
Error recovery (agent-runner-execution.ts)
How it connects to other modules
-
Depends on:
agents/pi-embedded-runner/—runEmbeddedPiAgent()for executionsessions/— session entry management, model/level overridesconfig/— agent config, model resolutionchannels/— cross-channel routing, typing indicatorsskills/— skill command handlingmemory/— memory flush after replies
-
Depended by:
gateway/— receives dispatched messagescron/— uses isolated agent execution path- Channel plugins — trigger
dispatchInboundMessage()
My blind spots
- Full command registry — how many commands exist and which ones short-circuit LLM calls
- Block reply pipeline internals — coalescing strategy and chunk sizing
- Group activation logic — how the agent decides whether to respond in group chats
-
reply/queue/— message queuing for rate limiting or ordering - Exact streaming callback order and race condition handling
-
followup-runner.ts— when and how follow-up runs are triggered -
elevated-allowlist-matcher.ts— elevated permission resolution
Related contributions
- None yet
Change frequency
dispatch-from-config.ts: High — routing logic, hooks, and delivery evolve with new featuresagent-runner-execution.ts: High — error recovery patterns added as edge cases surfacethinking.ts: Medium — new thinking levels or provider-specific handling added with model releasesdispatch.ts: Low — thin entry point, rarely changestokens.ts: Low — token constants are stable