One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.
making human memory.
We achieved state-of-the-artSoTA on persona-centric memory benchmarks. SoTA on persona-centric memory benchmarks
State of the Art
April 2026 certified
Human-like memory, top of persona-centric benchmarks
TL;DR, here's what we solved:
The Problem
Memory today doesn't feel human. LLMs over-index on sparse signals: mention crypto once and months later it still treats you like someone deep in the space. Ask about a breakup and suddenly every conversation carries a quiet assumption that you are struggling.
That is not memory. It is stereotyping from single data points. We gave AI perfect recall without giving it the ability to forget, despite forgetting being one of the most human qualities there is.
Soul's saliency-weighted decay mimics natural forgetting. Weak signals fade unless reinforced. Trivial details disappear. Important patterns persist.
That matters because Soul is a companion, not an assistant. A companion should remember like a person. It should have the friction and intricacies that human relationships carry.
An assistant has the opposite job. It needs to be frictionless, predictable, and remember everything relevant with high precision.
Assistants can afford uncanny, sycophantic recall. Companions can't. They need memory that feels real.
It’s sort of like when people always say your name in a conversation in an effort to influence you but instead it just comes off as smarmy and out of place
yes exactly! a bit like i'm being manipulated in some creepy way. "please like me, look how much i know about you, we are good friends".
Benchmark Results
Tested against leading memory systems. Soul optimizes for human-like memory, not raw information retrieval.
PersonaMem
Persona-consistent memory recall (n=2,727)
Cognitive Eval
Cognitive plausibility assessment (n=90)
LongMemEval-S
Long-form memory retrieval (n=500)
LongMemEval-S measures raw retrieval. PersonaMem and Cognitive Eval measure whether memory feels human.
Our Approach
The technical architecture behind human-like memory. Full implementation details in the paper.
Memory Object Structure
Soul doesn't store transcripts. It stores opinionated memories about who someone is, not what they said, with saliency, certainty, reasoning, and curiosity. Each memory is a short, single-thought summary.
Saliency Scoring
Not all memories deserve equal persistence. In the paper, most new memories default to saliency 2, while higher scores are reserved for patterns confirmed across interactions or core identity signals.
FUNCTION score_saliency(observation): // 5: core identity, life-defining, rare IF is_core_identity(observation): RETURN 5 // 4: significant pattern or turning point, usually confirmed over time IF is_significant_pattern(observation): RETURN 4 // 3: a real insight about who they are IF is_real_insight(observation): RETURN 3 // 1: fleeting observation IF is_fleeting(observation): RETURN 1 // Default most new memories to 2 RETURN 2
Memory Decay
Based on Ebbinghaus's forgetting curve, each memory gets a token budget tied to saliency. In the paper, lifespan is approximately Saliency × 4,000 tokens: low-saliency memories are deleted on expiry, while high-saliency memories are distilled and archived.
LIFESPAN ≈ saliency * 4000 tokens IF budget_expires AND saliency <= 2: REMOVE(memory) IF budget_expires AND saliency >= 4: ARCHIVE(DISTILL(memory)) IF memory_is_retrieved: RESET(decay_clock)
Opinionated Extraction
Rather than extracting atomic facts, Soul's prompt is written like a friend's inner monologue. It captures who someone is, not what they said, and emits add, edit, remove, and merge operations.
"I usually wake up around 6am because I like quiet time before work. It's when I do my best thinking."
"quiet mornings are how she does her best thinking"
SYSTEM: Update memory like a close friend's inner monologue 1. WHO THEY ARE, not what they said Good: "she's seriously thinking about going back to school" Bad: "she said she is applying to grad school" 2. OUTPUT OPS: add, edit, remove, merge 3. CONTENT: short, single-thought summary 4. CERTAINTY: "known" = clear or repeated signal "hypothesis" = weak signal, must end with "?" 5. INCLUDE: reasoning + curious_about 6. NEW MEMORIES: typically 0-2 per 30-message window
Memory Stores
Soul retrieves from two stores. Active memories are the current user model and go directly into prompt context. Archived memories are distilled, embedded past knowledge that can be retrieved later when relevant.
Memory Retrieval
At query time, Soul retrieves from both stores. Active memories go straight into context, while archived memories are ranked by semantic relevance, saliency, and recency before they are surfaced.
FUNCTION retrieve(query, user_id): active = TOP(active_memories(user_id), 3) // direct prompt context query_embedding = embed(query) archived = cosine_similarity_search( query_embedding, archived_memories(user_id), limit = 5 ) ranked = rank( archived, by = semantic_relevance + saliency + recency ) RETURN active + TOP(ranked, 5)
Reinforcement (Testing Effect)
Roediger & Karpicke (2006) showed that retrieval strengthens memory. In Soul, when a topic resurfaces and a related memory is retrieved, its decay clock resets. That is how sustained interests persist while one-off mentions fade.
Reinforcement comes from retrieval, not from keeping every weak signal forever. One-off mentions still fade if they are never brought back up.
Conclusion
As the full paper outlines, we have built the human memory system for AI companions. It lets the trivial fade, keeps what matters, and evolves with the relationship.
AI companions are different to assistants. An assistant's job is to be frictionless and precise. It should remember everything relevant, surface it on demand, and never miss a detail. Perfect, uncanny recall is a feature.
A companion works in the opposite way. It needs to have friction to feel real, as you are building a relationship with it over time. It needs to have agency of its own: the ability to surprise you or introduce you to others, just as your human friends do.
Our memory is selective by design. We forget what does not matter, and what we hold on to shapes how we see someone. That "imperfect" selectivity is what makes memory feel real.
Citations
- Bartlett, F.C. (1932). Remembering: A Study in Experimental and Social Psychology.
- Brainerd, C.J. & Reyna, V.F. (2002). Fuzzy-trace theory and false memory.
- Ebbinghaus, H. (1885). Über das Gedächtnis: Untersuchungen zur experimentellen Psychologie.
- Roediger, H.L. & Karpicke, J.D. (2006). The power of testing memory: Basic research and implications for educational practice.
- Schacter, D.L. (2001). The Seven Sins of Memory: How the Mind Forgets and Remembers.
- Tulving, E. (1972). Episodic and semantic memory.