Architecture Overview

CMM is a middleware layer between the conversation stream and the LLM's context window. The LLM does all reasoning -- the memory system only surfaces relevant information.

Core Design Principles

The memory system does NOT reason. It stores and retrieves information. The LLM handles all reasoning over recalled memories.
Passive/automatic operation. The system monitors, encodes, stores, and recalls without the agent needing to "decide" to remember or retrieve.
Gist compression, not verbatim storage. Memories are compressed summaries (like a movie plot, not the full movie). The LLM reconstructs detail at recall time.
Embeddings for retrieval, not HDC. Standard embedding model vectors (sentence-transformers) with FAISS for similarity search. No HDC/VSA random expansion or bind/bundle algebra.
O(1) retrieval at scale. FAISS IVF partitions the vector space into nlist cells. Query time is O(nprobe x N/nlist). Current default nlist=100 is suitable up to ~100K memories.

Pipeline

Conversation stream (user <-> agent turns, reasoning steps)
       |
       v
+-------------------+
|  Gist Encoder     |  LLM or small model: turn -> compressed summary + tags
+--------+----------+
         |
         v
+-------------------+
| Embedding Model   |  gist text -> 768D dense vector (all-mpnet-base-v2)
+--------+----------+
         |
         v
+---------------------------------+
|   FAISS-backed Memory Store     |  O(1) similarity search
|   + Entity Index (spaCy NER)    |  Named entity -> memory linkage
+--------+------------------------+
         |
         v  (on each new turn)
+---------------------------------+
| Cognitive Retrieval Pipeline    |
|  1. FAISS similarity search     |
|  2. Temporal decay + rehearsal  |
|  3. Importance weighting        |
|  4. Priming boost               |
|  5. Spreading activation        |
|     (embedding + entity links)  |
|  6. Working memory merge        |
|  7. Metamemory confidence       |
+--------+------------------------+
         |
         v
  Inject recalled memories into LLM context
  (clearly marked as "from memory, not user input")

Package Structure

cmm/
+-- core/
|   +-- types.py                  # Memory, Gist, ConversationTurn, RetrievalResult, Role, MemoryType
|   +-- memory_store.py           # FAISS-backed store with flat->IVF auto-training, remove + rebuild
+-- encoding/
|   +-- embedding.py              # Sentence-transformers wrapper (768D, all-mpnet-base-v2)
|   +-- gist_encoder.py           # GistEncoder ABC + PassthroughGistEncoder baseline
|   +-- ollama_gist_encoder.py    # OllamaGistEncoder -- local LLM gist compression via Ollama
|   +-- openai_gist_encoder.py    # OpenAIGistEncoder -- any OpenAI-compatible API
|   +-- anthropic_gist_encoder.py # AnthropicGistEncoder -- Anthropic Claude API
+-- retrieval/
|   +-- decay.py                  # DecayScorer -- temporal decay with rehearsal effect
|   +-- working_memory.py         # WorkingMemory -- fixed-size turn-based TTL buffer
|   +-- spreading_activation.py   # SpreadingActivation -- dual-path: FAISS neighbors + entity links
|   +-- entity_index.py           # EntityIndex -- spaCy NER + regex entity extraction and indexing
|   +-- priming.py                # PrimingState -- turn-decaying boost for recently activated memories
|   +-- metamemory.py             # MetamemoryScorer -- confidence levels + partial match hints
|   +-- retriever.py              # Full retrieval pipeline
+-- scoring/
|   +-- importance.py             # ImportanceScorer -- auto-scores turns
|   +-- valence.py                # ValenceScorer -- emotional valence, arousal, and emotion labels
+-- consolidation/
|   +-- consolidator.py           # Consolidator -- clusters episodic -> semantic memories
|   +-- ollama_summarizer.py      # OllamaConsolidationSummarizer -- LLM-based cluster summarization
|   +-- session.py                # SessionSummarizer -- end-of-session summary generation
+-- maintenance/
|   +-- maintenance.py            # MemoryMaintainer -- pruning, deduplication, health metrics
+-- multi_agent/
|   +-- shared_store.py           # SharedMemoryManager -- multi-agent shared memory
+-- pipeline/
    +-- conversation.py           # CognitiveMemoryPipeline -- main entry point

Memory Hierarchy

Three levels of compression, mirroring human memory:

Turn-level gist -- what just happened (a sentence or two)
Session-level summary -- what happened in this conversation (a paragraph)
Thread-level theme -- recurring patterns across conversations (keywords + a sentence)

Consolidation promotes: turns -> sessions -> themes over time.