Skip to content

Architecture Overview

CMM is a middleware layer between the conversation stream and the LLM's context window. The LLM does all reasoning -- the memory system only surfaces relevant information.

Core Design Principles

  1. The memory system does NOT reason. It stores and retrieves information. The LLM handles all reasoning over recalled memories.
  2. Passive/automatic operation. The system monitors, encodes, stores, and recalls without the agent needing to "decide" to remember or retrieve.
  3. Gist compression, not verbatim storage. Memories are compressed summaries (like a movie plot, not the full movie). The LLM reconstructs detail at recall time.
  4. Embeddings for retrieval, not HDC. Standard embedding model vectors (sentence-transformers) with FAISS for similarity search. No HDC/VSA random expansion or bind/bundle algebra.
  5. O(1) retrieval at scale. FAISS IVF partitions the vector space into nlist cells. Query time is O(nprobe x N/nlist). Current default nlist=100 is suitable up to ~100K memories.

Pipeline

Conversation stream (user <-> agent turns, reasoning steps)
       |
       v
+-------------------+
|  Gist Encoder     |  LLM or small model: turn -> compressed summary + tags
+--------+----------+
         |
         v
+-------------------+
| Embedding Model   |  gist text -> 768D dense vector (all-mpnet-base-v2)
+--------+----------+
         |
         v
+---------------------------------+
|   FAISS-backed Memory Store     |  O(1) similarity search
|   + Entity Index (spaCy NER)    |  Named entity -> memory linkage
+--------+------------------------+
         |
         v  (on each new turn)
+---------------------------------+
| Cognitive Retrieval Pipeline    |
|  1. FAISS similarity search     |
|  2. Temporal decay + rehearsal  |
|  3. Importance weighting        |
|  4. Priming boost               |
|  5. Spreading activation        |
|     (embedding + entity links)  |
|  6. Working memory merge        |
|  7. Metamemory confidence       |
+--------+------------------------+
         |
         v
  Inject recalled memories into LLM context
  (clearly marked as "from memory, not user input")

Package Structure

cmm/
+-- core/
|   +-- types.py                  # Memory, Gist, ConversationTurn, RetrievalResult, Role, MemoryType
|   +-- memory_store.py           # FAISS-backed store with flat->IVF auto-training, remove + rebuild
+-- encoding/
|   +-- embedding.py              # Sentence-transformers wrapper (768D, all-mpnet-base-v2)
|   +-- gist_encoder.py           # GistEncoder ABC + PassthroughGistEncoder baseline
|   +-- ollama_gist_encoder.py    # OllamaGistEncoder -- local LLM gist compression via Ollama
|   +-- openai_gist_encoder.py    # OpenAIGistEncoder -- any OpenAI-compatible API
|   +-- anthropic_gist_encoder.py # AnthropicGistEncoder -- Anthropic Claude API
+-- retrieval/
|   +-- decay.py                  # DecayScorer -- temporal decay with rehearsal effect
|   +-- working_memory.py         # WorkingMemory -- fixed-size turn-based TTL buffer
|   +-- spreading_activation.py   # SpreadingActivation -- dual-path: FAISS neighbors + entity links
|   +-- entity_index.py           # EntityIndex -- spaCy NER + regex entity extraction and indexing
|   +-- priming.py                # PrimingState -- turn-decaying boost for recently activated memories
|   +-- metamemory.py             # MetamemoryScorer -- confidence levels + partial match hints
|   +-- retriever.py              # Full retrieval pipeline
+-- scoring/
|   +-- importance.py             # ImportanceScorer -- auto-scores turns
|   +-- valence.py                # ValenceScorer -- emotional valence, arousal, and emotion labels
+-- consolidation/
|   +-- consolidator.py           # Consolidator -- clusters episodic -> semantic memories
|   +-- ollama_summarizer.py      # OllamaConsolidationSummarizer -- LLM-based cluster summarization
|   +-- session.py                # SessionSummarizer -- end-of-session summary generation
+-- maintenance/
|   +-- maintenance.py            # MemoryMaintainer -- pruning, deduplication, health metrics
+-- multi_agent/
|   +-- shared_store.py           # SharedMemoryManager -- multi-agent shared memory
+-- pipeline/
    +-- conversation.py           # CognitiveMemoryPipeline -- main entry point

Memory Hierarchy

Three levels of compression, mirroring human memory:

  1. Turn-level gist -- what just happened (a sentence or two)
  2. Session-level summary -- what happened in this conversation (a paragraph)
  3. Thread-level theme -- recurring patterns across conversations (keywords + a sentence)

Consolidation promotes: turns -> sessions -> themes over time.