Skip to content

Python Middleware

For Python developers building applications with LLM APIs. Wraps any OpenAI-compatible or Anthropic API call with automatic memory.

Quick Start

from integrations.middleware import MemoryMiddleware

# With Anthropic
mw = MemoryMiddleware(api_type="anthropic")
response = mw.chat("My project deadline is April 15th.")

# With OpenAI or any OpenAI-compatible API
mw = MemoryMiddleware(api_type="openai", api_key="sk-...")
response = mw.chat("I'm allergic to peanuts.")
response = mw.chat("Order lunch for the team.")
# ^ Automatically recalls the peanut allergy

# With any OpenAI-compatible endpoint (Together, Groq, vLLM, LM Studio)
mw = MemoryMiddleware(
    api_type="openai",
    base_url="https://api.together.xyz/v1",
    model="meta-llama/Llama-3-8b-chat-hf",
    api_key="...",
)

How It Works

Every mw.chat() call automatically:

  1. Ingests the user message into memory
  2. Recalls relevant memories and injects them into the prompt
  3. Sends the augmented prompt to the LLM
  4. Ingests the LLM response into memory
  5. Returns the response

No explicit memory operations needed. The middleware handles everything.

Think-Out-Loud Mode

Capture the LLM's reasoning as THOUGHT-type memories:

mw = MemoryMiddleware(api_type="anthropic", think_out_loud=True)
response = mw.chat("Debug this CSV parser issue.")
# LLM's <thinking>...</thinking> block is stored as a THOUGHT memory
# and stripped from the returned response.

This means the LLM's internal reasoning about a problem is stored in memory and can be recalled later when a similar problem arises.