Memory (RAG)
One-line summary
The memory system provides retrieval-augmented generation via a hybrid search engine (vector + FTS5 keyword + MMR re-ranking) over a local SQLite database, injecting relevant context chunks into each LLM conversation.
Responsibilities
- Index Markdown files from agent workspace into searchable chunks (vector embeddings + FTS5)
- Execute hybrid search (vector similarity + keyword matching) with configurable weights
- Apply MMR (Maximal Marginal Relevance) re-ranking for result diversity
- Apply temporal decay to prioritize recent content
- Support multiple embedding providers (OpenAI, Gemini, Voyage, Mistral, local node-llama-cpp)
- Gracefully fall back to FTS-only mode when no embedding provider is available
- Inject search results as context into agent conversations
Architecture diagram
Key source files
Data flow
Indexing
Search & injection
Token optimization impact
Configuration knobs
Optimization opportunities
- Lower
maxResults: Each result adds ~175 tokens; reducing from 10 to 5 saves ~875 tokens/turn - Higher
minScore: Filter out low-quality results that add tokens without value - MMR diversity: Already helps by removing redundant chunks
- FTS-only mode: Zero embedding API cost, still functional for keyword-heavy queries
- Temporal decay: Naturally deprioritizes stale content
Embedding provider cost
How it connects to other modules
-
Depends on:
config/— memory settings, embedding provider configurationsessions/— session-aware indexing- SQLite + sqlite-vec — storage engine
- chokidar — file system watching
-
Depended by:
agents/pi-embedded-runner/— memory context injected via Pi SDK hookssystem-prompt.ts— memory recall instructionsauto-reply/reply/memory-flush.ts— memory flush after replies
My blind spots
- Exact chunk size and overlap settings — how files are split into chunks
-
maxResultsdefault value — need to find in config defaults - How memory context is actually injected — as system message or user message?
-
qmd-manager.ts(2000+ lines) — advanced query metadata system, largely unexplored - Batch embedding operations — how they handle rate limits and failures
- Whether search results are cached across turns
-
manager-embedding-ops.ts— embedding lifecycle and re-indexing strategy
Related contributions
- None yet
Change frequency
manager.ts: Medium — search API evolves with new featureshybrid.ts: Low — merge algorithm is stableembeddings.ts: Medium — new providers added as they become availablemmr.ts: Low — diversity algorithm is matureqmd-manager.ts: Medium — advanced query features actively developed