Python Middleware
For Python developers building applications with LLM APIs. Wraps any OpenAI-compatible or Anthropic API call with automatic memory.
Quick Start
from integrations.middleware import MemoryMiddleware
# With Anthropic
mw = MemoryMiddleware(api_type="anthropic")
response = mw.chat("My project deadline is April 15th.")
# With OpenAI or any OpenAI-compatible API
mw = MemoryMiddleware(api_type="openai", api_key="sk-...")
response = mw.chat("I'm allergic to peanuts.")
response = mw.chat("Order lunch for the team.")
# ^ Automatically recalls the peanut allergy
# With any OpenAI-compatible endpoint (Together, Groq, vLLM, LM Studio)
mw = MemoryMiddleware(
api_type="openai",
base_url="https://api.together.xyz/v1",
model="meta-llama/Llama-3-8b-chat-hf",
api_key="...",
)
How It Works
Every mw.chat() call automatically:
- Ingests the user message into memory
- Recalls relevant memories and injects them into the prompt
- Sends the augmented prompt to the LLM
- Ingests the LLM response into memory
- Returns the response
No explicit memory operations needed. The middleware handles everything.
Think-Out-Loud Mode
Capture the LLM's reasoning as THOUGHT-type memories:
mw = MemoryMiddleware(api_type="anthropic", think_out_loud=True)
response = mw.chat("Debug this CSV parser issue.")
# LLM's <thinking>...</thinking> block is stored as a THOUGHT memory
# and stripped from the returned response.
This means the LLM's internal reasoning about a problem is stored in memory and can be recalled later when a similar problem arises.