Auto-Reply Pipeline

One-line summary

The auto-reply pipeline is the decision engine between an inbound message and an LLM call — it parses commands, resolves directives, gates agent execution, manages streaming, and delivers replies.

Responsibilities

  • Dispatch inbound messages through a multi-stage decision pipeline
  • Parse slash commands (/help, /new, /model, /think, /status, etc.)
  • Resolve directives (thinking level, verbose, reasoning, elevated) with session persistence
  • Decide whether to trigger an LLM call or handle the message locally
  • Orchestrate agent execution with model fallback chains
  • Manage streaming (partial replies, block replies, tool results, reasoning)
  • Handle cross-channel routing (e.g., Telegram message processed by Slack session)
  • Apply TTS to outbound payloads when configured

Architecture diagram

Key source files

FileLinesRole
src/auto-reply/dispatch.ts97Entry point: dispatchInboundMessage(), dispatcher lifecycle
src/auto-reply/reply/dispatch-from-config.ts510Core router: duplicate check, hooks, abort, cross-channel routing, TTS, delivery
src/auto-reply/reply/agent-runner-execution.ts586Agent executor: runAgentTurnWithFallback(), streaming, error recovery, model fallback
src/auto-reply/reply.ts447Reply entry point, agent turn orchestration
src/auto-reply/thinking.ts227Thinking/verbose/reasoning/elevated level types and normalization
src/auto-reply/tokens.ts40Special tokens: HEARTBEAT_OK, NO_REPLY
src/auto-reply/command-detection.tsDetect commands in messages
src/auto-reply/commands-registry.tsCommand registry with metadata
src/auto-reply/group-activation.tsGroup chat activation logic
src/auto-reply/heartbeat.tsHeartbeat reply handling
src/auto-reply/reply/directive-handling.*.tsDirective parsing, persistence, level resolution
src/auto-reply/reply/commands.ts~600Command routing and execution
src/auto-reply/reply/block-reply-pipeline.tsBlock streaming pipeline

Data flow

Inbound

Channel Plugin (normalized message)

dispatch.ts — dispatchInboundMessage()

dispatch-from-config.ts — dispatchReplyFromConfig()

  ├── 1. shouldSkipDuplicateInbound() → skip if duplicate
  ├── 2. Fire plugin hooks (message_received)
  ├── 3. Fast abort check (stop command)
  ├── 4. Cross-channel routing (if message source ≠ session channel)
  ├── 5. getReplyFromConfig()
  │     ├── Session init (create or load)
  │     ├── Command detection + authorization
  │     ├── Directive parsing (/think:high, /verbose:on)
  │     ├── If directive-only → return ack, no LLM call
  │     ├── Inline actions (/status, /info, /reset)
  │     └── runPreparedReply() → agent execution
  └── 6. Post-processing (TTS, delivery, diagnostics)

Skip conditions (no LLM call)

The pipeline skips the LLM call when:

  • Message is empty after normalization (and no media)
  • Unauthorized control command detected
  • Directive-only message (e.g., just /think:high)
  • Inline command handled directly (e.g., /status)
  • Fast abort triggered (stop command)
  • Duplicate message detected

Outbound

Agent execution result (payloads)

Strip stray HEARTBEAT_OK tokens

Filter NO_REPLY silent tokens

Apply TTS (if configured)

Route to target channel

ReplyDispatcher.sendFinalReply()

Token optimization impact

MechanismToken impactDetails
Gate-keepingPrevents unnecessary LLM callsCommand handling, directive-only messages, duplicates — all skip the expensive LLM call
Thinking level controlVariable overhead per callxhigh thinking uses significantly more tokens than off; auto-downgrade prevents wasted attempts
Model fallbackRetry cost on failureFailed calls consume tokens before fallback triggers; the pipeline minimizes wasted calls
StreamingNo direct token impactStreaming doesn't change token count, but block pipeline affects perceived latency
TTSSeparate API callTTS is a separate cost, not counted in LLM tokens

Thinking level hierarchy

Resolution order (first wins):
  1. Inline directive (/think:high)
  2. Session-persisted level
  3. Agent config default
  4. Model-specific default

Levels: off < minimal < low < medium < high < xhigh
  - xhigh: Only GPT-5.x codex models
  - Binary providers (Z.ai): Only on/off
  - Auto-downgrade if model doesn't support requested level

Error recovery (agent-runner-execution.ts)

ErrorRecoverySession impact
Context overflowReset sessionLoses history
Compaction failureReset sessionLoses history
Role ordering conflictReset sessionLoses history
Gemini function call orderingReset sessionLoses history
Transient HTTP (502/521)Retry once after delayNo session impact
Model unsupportedFallback to next modelNo session impact

How it connects to other modules

  • Depends on:

    • agents/pi-embedded-runner/runEmbeddedPiAgent() for execution
    • sessions/ — session entry management, model/level overrides
    • config/ — agent config, model resolution
    • channels/ — cross-channel routing, typing indicators
    • skills/ — skill command handling
    • memory/ — memory flush after replies
  • Depended by:

    • gateway/ — receives dispatched messages
    • cron/ — uses isolated agent execution path
    • Channel plugins — trigger dispatchInboundMessage()

My blind spots

  • Full command registry — how many commands exist and which ones short-circuit LLM calls
  • Block reply pipeline internals — coalescing strategy and chunk sizing
  • Group activation logic — how the agent decides whether to respond in group chats
  • reply/queue/ — message queuing for rate limiting or ordering
  • Exact streaming callback order and race condition handling
  • followup-runner.ts — when and how follow-up runs are triggered
  • elevated-allowlist-matcher.ts — elevated permission resolution
  • None yet

Change frequency

  • dispatch-from-config.ts: High — routing logic, hooks, and delivery evolve with new features
  • agent-runner-execution.ts: High — error recovery patterns added as edge cases surface
  • thinking.ts: Medium — new thinking levels or provider-specific handling added with model releases
  • dispatch.ts: Low — thin entry point, rarely changes
  • tokens.ts: Low — token constants are stable