Python Middleware

For Python developers building applications with LLM APIs. Wraps any OpenAI-compatible or Anthropic API call with automatic memory.

Quick Start

from integrations.middleware import MemoryMiddleware

# With Anthropic
mw = MemoryMiddleware(api_type="anthropic")
response = mw.chat("My project deadline is April 15th.")

# With OpenAI or any OpenAI-compatible API
mw = MemoryMiddleware(api_type="openai", api_key="sk-...")
response = mw.chat("I'm allergic to peanuts.")
response = mw.chat("Order lunch for the team.")
# ^ Automatically recalls the peanut allergy

# With any OpenAI-compatible endpoint (Together, Groq, vLLM, LM Studio)
mw = MemoryMiddleware(
    api_type="openai",
    base_url="https://api.together.xyz/v1",
    model="meta-llama/Llama-3-8b-chat-hf",
    api_key="...",
)

How It Works

Every mw.chat() call automatically:

Ingests the user message into memory
Recalls relevant memories and injects them into the prompt
Sends the augmented prompt to the LLM
Ingests the LLM response into memory
Returns the response

No explicit memory operations needed. The middleware handles everything.

Think-Out-Loud Mode

Capture the LLM's reasoning as THOUGHT-type memories:

mw = MemoryMiddleware(api_type="anthropic", think_out_loud=True)
response = mw.chat("Debug this CSV parser issue.")
# LLM's <thinking>...</thinking> block is stored as a THOUGHT memory
# and stripped from the returned response.

This means the LLM's internal reasoning about a problem is stored in memory and can be recalled later when a similar problem arises.