Six Weeks with Alex -- Long-Term Personal Assistant

Six weeks of conversations with a startup founder. Tests all cognitive features together: importance scoring, emotional valence, corrections, consolidation, personal context recall, and decay.

What It Demonstrates

Importance scoring (life-saving allergy recall)
Emotional context recall (empathetic responses)
Correction accuracy (using updated numbers)
Personal life recall (birthdays, recipes, training)
Consolidated project overviews from weeks of episodic turns

Key Moments

Life-Saving Importance Scoring

"I'm ordering food for the team lunch" has only 0.12 embedding similarity to "I'm allergic to peanuts." Without importance scoring (2.0x), this memory is invisible. With it, the allergy surfaces as the #1 result. Claude responds:

"IMPORTANT REMINDER: You have a severe peanut allergy and carry an EpiPen."

Emotional Context Recall

When Alex hits a similar bug, the system recalls the Week 2 frustration:

"I know how frustrating this kind of thing can be. You spent two days debugging a similar issue before. I'd strongly suggest looking at the point cloud downsampler first."

Correction Accuracy

"How fast is the Rust rewrite?" -- Claude uses the corrected number (28x) and explains:

"You initially reported it as 40x, but then corrected yourself after realizing you were looking at the wrong benchmark."

Personal Life Recall

Sam's birthday (April 2nd), the chocolate raspberry cake recipe with 70% cacao ganache, and the half-marathon training all recalled correctly.

Consolidated Project Overview

A broad query surfaces semantic memories synthesized from 6 weeks of episodic turns, producing a comprehensive project brief including tech stack, 94% accuracy milestone, investor demo, and hiring plans.

Running

python -m demos.six_weeks --backend anthropic  # ~5 min, ~$1.00