Architecture Overview
CMM is a middleware layer between the conversation stream and the LLM's context window. The LLM does all reasoning -- the memory system only surfaces relevant information.
Core Design Principles
- The memory system does NOT reason. It stores and retrieves information. The LLM handles all reasoning over recalled memories.
- Passive/automatic operation. The system monitors, encodes, stores, and recalls without the agent needing to "decide" to remember or retrieve.
- Gist compression, not verbatim storage. Memories are compressed summaries (like a movie plot, not the full movie). The LLM reconstructs detail at recall time.
- Embeddings for retrieval, not HDC. Standard embedding model vectors (sentence-transformers) with FAISS for similarity search. No HDC/VSA random expansion or bind/bundle algebra.
- O(1) retrieval at scale. FAISS IVF partitions the vector space into
nlistcells. Query time is O(nprobe x N/nlist). Current default nlist=100 is suitable up to ~100K memories.
Pipeline
Conversation stream (user <-> agent turns, reasoning steps)
|
v
+-------------------+
| Gist Encoder | LLM or small model: turn -> compressed summary + tags
+--------+----------+
|
v
+-------------------+
| Embedding Model | gist text -> 768D dense vector (all-mpnet-base-v2)
+--------+----------+
|
v
+---------------------------------+
| FAISS-backed Memory Store | O(1) similarity search
| + Entity Index (spaCy NER) | Named entity -> memory linkage
+--------+------------------------+
|
v (on each new turn)
+---------------------------------+
| Cognitive Retrieval Pipeline |
| 1. FAISS similarity search |
| 2. Temporal decay + rehearsal |
| 3. Importance weighting |
| 4. Priming boost |
| 5. Spreading activation |
| (embedding + entity links) |
| 6. Working memory merge |
| 7. Metamemory confidence |
+--------+------------------------+
|
v
Inject recalled memories into LLM context
(clearly marked as "from memory, not user input")
Package Structure
cmm/
+-- core/
| +-- types.py # Memory, Gist, ConversationTurn, RetrievalResult, Role, MemoryType
| +-- memory_store.py # FAISS-backed store with flat->IVF auto-training, remove + rebuild
+-- encoding/
| +-- embedding.py # Sentence-transformers wrapper (768D, all-mpnet-base-v2)
| +-- gist_encoder.py # GistEncoder ABC + PassthroughGistEncoder baseline
| +-- ollama_gist_encoder.py # OllamaGistEncoder -- local LLM gist compression via Ollama
| +-- openai_gist_encoder.py # OpenAIGistEncoder -- any OpenAI-compatible API
| +-- anthropic_gist_encoder.py # AnthropicGistEncoder -- Anthropic Claude API
+-- retrieval/
| +-- decay.py # DecayScorer -- temporal decay with rehearsal effect
| +-- working_memory.py # WorkingMemory -- fixed-size turn-based TTL buffer
| +-- spreading_activation.py # SpreadingActivation -- dual-path: FAISS neighbors + entity links
| +-- entity_index.py # EntityIndex -- spaCy NER + regex entity extraction and indexing
| +-- priming.py # PrimingState -- turn-decaying boost for recently activated memories
| +-- metamemory.py # MetamemoryScorer -- confidence levels + partial match hints
| +-- retriever.py # Full retrieval pipeline
+-- scoring/
| +-- importance.py # ImportanceScorer -- auto-scores turns
| +-- valence.py # ValenceScorer -- emotional valence, arousal, and emotion labels
+-- consolidation/
| +-- consolidator.py # Consolidator -- clusters episodic -> semantic memories
| +-- ollama_summarizer.py # OllamaConsolidationSummarizer -- LLM-based cluster summarization
| +-- session.py # SessionSummarizer -- end-of-session summary generation
+-- maintenance/
| +-- maintenance.py # MemoryMaintainer -- pruning, deduplication, health metrics
+-- multi_agent/
| +-- shared_store.py # SharedMemoryManager -- multi-agent shared memory
+-- pipeline/
+-- conversation.py # CognitiveMemoryPipeline -- main entry point
Memory Hierarchy
Three levels of compression, mirroring human memory:
- Turn-level gist -- what just happened (a sentence or two)
- Session-level summary -- what happened in this conversation (a paragraph)
- Thread-level theme -- recurring patterns across conversations (keywords + a sentence)
Consolidation promotes: turns -> sessions -> themes over time.