Auto-Reply Pipeline

One-line summary

The auto-reply pipeline is the decision engine between an inbound message and an LLM call — it parses commands, resolves directives, gates agent execution, manages streaming, and delivers replies.

Responsibilities

Dispatch inbound messages through a multi-stage decision pipeline
Parse slash commands (/help, /new, /model, /think, /status, etc.)
Resolve directives (thinking level, verbose, reasoning, elevated) with session persistence
Decide whether to trigger an LLM call or handle the message locally
Orchestrate agent execution with model fallback chains
Manage streaming (partial replies, block replies, tool results, reasoning)
Handle cross-channel routing (e.g., Telegram message processed by Slack session)
Apply TTS to outbound payloads when configured

Architecture diagram

Key source files

File	Lines	Role
`src/auto-reply/dispatch.ts`	97	Entry point: `dispatchInboundMessage()`, dispatcher lifecycle
`src/auto-reply/reply/dispatch-from-config.ts`	510	Core router: duplicate check, hooks, abort, cross-channel routing, TTS, delivery
`src/auto-reply/reply/agent-runner-execution.ts`	586	Agent executor: `runAgentTurnWithFallback()`, streaming, error recovery, model fallback
`src/auto-reply/reply.ts`	447	Reply entry point, agent turn orchestration
`src/auto-reply/thinking.ts`	227	Thinking/verbose/reasoning/elevated level types and normalization
`src/auto-reply/tokens.ts`	40	Special tokens: `HEARTBEAT_OK`, `NO_REPLY`
`src/auto-reply/command-detection.ts`	—	Detect commands in messages
`src/auto-reply/commands-registry.ts`	—	Command registry with metadata
`src/auto-reply/group-activation.ts`	—	Group chat activation logic
`src/auto-reply/heartbeat.ts`	—	Heartbeat reply handling
`src/auto-reply/reply/directive-handling.*.ts`	—	Directive parsing, persistence, level resolution
`src/auto-reply/reply/commands.ts`	~600	Command routing and execution
`src/auto-reply/reply/block-reply-pipeline.ts`	—	Block streaming pipeline

Data flow

Inbound

Channel Plugin (normalized message)
  ↓
dispatch.ts — dispatchInboundMessage()
  ↓
dispatch-from-config.ts — dispatchReplyFromConfig()
  │
  ├── 1. shouldSkipDuplicateInbound() → skip if duplicate
  ├── 2. Fire plugin hooks (message_received)
  ├── 3. Fast abort check (stop command)
  ├── 4. Cross-channel routing (if message source ≠ session channel)
  ├── 5. getReplyFromConfig()
  │     ├── Session init (create or load)
  │     ├── Command detection + authorization
  │     ├── Directive parsing (/think:high, /verbose:on)
  │     ├── If directive-only → return ack, no LLM call
  │     ├── Inline actions (/status, /info, /reset)
  │     └── runPreparedReply() → agent execution
  └── 6. Post-processing (TTS, delivery, diagnostics)

Skip conditions (no LLM call)

The pipeline skips the LLM call when:

Message is empty after normalization (and no media)
Unauthorized control command detected
Directive-only message (e.g., just /think:high)
Inline command handled directly (e.g., /status)
Fast abort triggered (stop command)
Duplicate message detected

Outbound

Agent execution result (payloads)
  ↓
Strip stray HEARTBEAT_OK tokens
  ↓
Filter NO_REPLY silent tokens
  ↓
Apply TTS (if configured)
  ↓
Route to target channel
  ↓
ReplyDispatcher.sendFinalReply()

Token optimization impact

Mechanism	Token impact	Details
Gate-keeping	Prevents unnecessary LLM calls	Command handling, directive-only messages, duplicates — all skip the expensive LLM call
Thinking level control	Variable overhead per call	`xhigh` thinking uses significantly more tokens than `off`; auto-downgrade prevents wasted attempts
Model fallback	Retry cost on failure	Failed calls consume tokens before fallback triggers; the pipeline minimizes wasted calls
Streaming	No direct token impact	Streaming doesn't change token count, but block pipeline affects perceived latency
TTS	Separate API call	TTS is a separate cost, not counted in LLM tokens

Thinking level hierarchy

Resolution order (first wins):
  1. Inline directive (/think:high)
  2. Session-persisted level
  3. Agent config default
  4. Model-specific default

Levels: off < minimal < low < medium < high < xhigh
  - xhigh: Only GPT-5.x codex models
  - Binary providers (Z.ai): Only on/off
  - Auto-downgrade if model doesn't support requested level

Error recovery (agent-runner-execution.ts)

Error	Recovery	Session impact
Context overflow	Reset session	Loses history
Compaction failure	Reset session	Loses history
Role ordering conflict	Reset session	Loses history
Gemini function call ordering	Reset session	Loses history
Transient HTTP (502/521)	Retry once after delay	No session impact
Model unsupported	Fallback to next model	No session impact

How it connects to other modules

Depends on:
- agents/pi-embedded-runner/ — runEmbeddedPiAgent() for execution
- sessions/ — session entry management, model/level overrides
- config/ — agent config, model resolution
- channels/ — cross-channel routing, typing indicators
- skills/ — skill command handling
- memory/ — memory flush after replies
Depended by:
- gateway/ — receives dispatched messages
- cron/ — uses isolated agent execution path
- Channel plugins — trigger dispatchInboundMessage()

Full command registry — how many commands exist and which ones short-circuit LLM calls
Block reply pipeline internals — coalescing strategy and chunk sizing
Group activation logic — how the agent decides whether to respond in group chats
reply/queue/ — message queuing for rate limiting or ordering
Exact streaming callback order and race condition handling
followup-runner.ts — when and how follow-up runs are triggered
elevated-allowlist-matcher.ts — elevated permission resolution

None yet

Change frequency

dispatch-from-config.ts: High — routing logic, hooks, and delivery evolve with new features
agent-runner-execution.ts: High — error recovery patterns added as edge cases surface
thinking.ts: Medium — new thinking levels or provider-specific handling added with model releases
dispatch.ts: Low — thin entry point, rarely changes
tokens.ts: Low — token constants are stable

#Auto-Reply Pipeline

#One-line summary

#Responsibilities

#Architecture diagram

#Key source files

#Data flow

#Inbound

#Skip conditions (no LLM call)

#Outbound

#Token optimization impact

#Thinking level hierarchy

#Error recovery (agent-runner-execution.ts)

#How it connects to other modules

#My blind spots

#Related contributions

#Change frequency