making human memory.

We achieved state-of-the-artSoTA on persona-centric memory benchmarks. SoTA on persona-centric memory benchmarks

read the paper SOTA results

State of the Art

April 2026 certified

Human-like memory, top of persona-centric benchmarks

TL;DR, here's what we solved:

The Problem

Memory today doesn't feel human. LLMs over-index on sparse signals: mention crypto once and months later it still treats you like someone deep in the space. Ask about a breakup and suddenly every conversation carries a quiet assumption that you are struggling.

That is not memory. It is stereotyping from single data points. We gave AI perfect recall without giving it the ability to forget, despite forgetting being one of the most human qualities there is.

Soul's saliency-weighted decay mimics natural forgetting. Weak signals fade unless reinforced. Trivial details disappear. Important patterns persist.

That matters because Soul is a companion, not an assistant. A companion should remember like a person. It should have the friction and intricacies that human relationships carry.

An assistant has the opposite job. It needs to be frictionless, predictable, and remember everything relevant with high precision.

Assistants can afford uncanny, sycophantic recall. Companions can't. They need memory that feels real.

Benchmark Results

Tested against leading memory systems. Soul optimizes for human-like memory, not raw information retrieval.

PersonaMem

Persona-consistent memory recall (n=2,727)

67.2% SOTA
Soul
67.2%
Mem0
51%
Mastra OM
33%
SuperMemory
32%

Cognitive Eval

Cognitive plausibility assessment (n=90)

74.5% SOTA
Soul
74.5%
Mastra OM
64.6%
Mem0
61.1%
SuperMemory
33.6%

LongMemEval-S

Long-form memory retrieval (n=500)

Soul
75.0%
Mastra OM
~95%
Mem0
23.6%
SuperMemory
~99%

LongMemEval-S measures raw retrieval. PersonaMem and Cognitive Eval measure whether memory feels human.

Our Approach

The technical architecture behind human-like memory. Full implementation details in the paper.

Memory Object Structure

Soul doesn't store transcripts. It stores opinionated memories about who someone is, not what they said, with saliency, certainty, reasoning, and curiosity. Each memory is a short, single-thought summary.

MEMORY OBJECT content "she's way too hard on herself about the art stuff" saliency 4 / 5 certainty "known" reasoning "she brushed off the work again; that insecurity feels like a pattern, not a one-off comment" curious_about ["has she always been this hard on herself?"] ← short summary ← importance 1-5 ← "known" or "hypothesis" ← inner monologue ← follow-up questions
Memory Structure
MEMORY {
  content:       "she's way too hard on herself about the art stuff"// short, single-thought summary
  saliency:      4// 1=fleeting, 5=core identity
  certainty:     "known"// or "hypothesis"
  reasoning:     "she brushed off the work again despite obvious care"// inner monologue
  curious_about: ["has she always been this self-critical?"]// follow-up curiosity
}

Saliency Scoring

Not all memories deserve equal persistence. In the paper, most new memories default to saliency 2, while higher scores are reserved for patterns confirmed across interactions or core identity signals.

Level Description Lifespan Example
1 Fleeting observation ~4,000 tokens "had a veggie wrap for lunch"
2 Lighter context or minor habit ~8,000 tokens "green tea has replaced coffee lately"
3 A real insight about who they are ~12,000 tokens "running helps her stay sane"
4 Significant pattern or turning point ~16,000 tokens "going back to school feels serious now"
5 Core identity, life-defining, rare ~20,000 tokens "15-year vegetarian for ethical reasons"
Saliency Assignment
FUNCTION score_saliency(observation):
  // 5: core identity, life-defining, rare
  IF is_core_identity(observation):
    RETURN 5

  // 4: significant pattern or turning point, usually confirmed over time
  IF is_significant_pattern(observation):
    RETURN 4

  // 3: a real insight about who they are
  IF is_real_insight(observation):
    RETURN 3

  // 1: fleeting observation
  IF is_fleeting(observation):
    RETURN 1

  // Default most new memories to 2
  RETURN 2

Memory Decay

Based on Ebbinghaus's forgetting curve, each memory gets a token budget tied to saliency. In the paper, lifespan is approximately Saliency × 4,000 tokens: low-saliency memories are deleted on expiry, while high-saliency memories are distilled and archived.

tokens elapsed retention 4000 8000 12000 16000 20000 sal=5 sal=4 sal=3 sal=2 sal=1 Ebbinghaus (1885)
Decay Rules
LIFESPAN ≈ saliency * 4000 tokens

IF budget_expires AND saliency <= 2:
  REMOVE(memory)

IF budget_expires AND saliency >= 4:
  ARCHIVE(DISTILL(memory))

IF memory_is_retrieved:
  RESET(decay_clock)

Opinionated Extraction

Rather than extracting atomic facts, Soul's prompt is written like a friend's inner monologue. It captures who someone is, not what they said, and emits add, edit, remove, and merge operations.

Input

"I usually wake up around 6am because I like quiet time before work. It's when I do my best thinking."

opinionated extraction
Memory

"quiet mornings are how she does her best thinking"

saliency: 3 certainty: known
Extraction Prompt (summary)
SYSTEM: Update memory like a close friend's inner monologue

1. WHO THEY ARE, not what they said
   Good: "she's seriously thinking about going back to school"
   Bad:  "she said she is applying to grad school"

2. OUTPUT OPS: add, edit, remove, merge

3. CONTENT: short, single-thought summary

4. CERTAINTY:
   "known"      = clear or repeated signal
   "hypothesis" = weak signal, must end with "?"

5. INCLUDE: reasoning + curious_about

6. NEW MEMORIES: typically 0-2 per 30-message window

Memory Stores

Soul retrieves from two stores. Active memories are the current user model and go directly into prompt context. Archived memories are distilled, embedded past knowledge that can be retrieved later when relevant.

SOUL MEMORY STORES ACTIVE MEMORIES current user model saliency 1-5 + certainty included directly in prompt ARCHIVED MEMORIES distilled past knowledge vector embeddings retrieved when relevant distill + archive

Memory Retrieval

At query time, Soul retrieves from both stores. Active memories go straight into context, while archived memories are ranked by semantic relevance, saliency, and recency before they are surfaced.

Memory Retrieval
FUNCTION retrieve(query, user_id):
  active = TOP(active_memories(user_id), 3) // direct prompt context

  query_embedding = embed(query)
  archived = cosine_similarity_search(
    query_embedding,
    archived_memories(user_id),
    limit = 5
  )

  ranked = rank(
    archived,
    by = semantic_relevance + saliency + recency
  )

  RETURN active + TOP(ranked, 5)

Reinforcement (Testing Effect)

Roediger & Karpicke (2006) showed that retrieval strengthens memory. In Soul, when a topic resurfaces and a related memory is retrieved, its decay clock resets. That is how sustained interests persist while one-off mentions fade.

Memory retrieved
decay_clock = RESET Reset decay clock
memory stays active Retrieved memories resist expiry
signal repeats over time Sustained interests persist

Reinforcement comes from retrieval, not from keeping every weak signal forever. One-off mentions still fade if they are never brought back up.

Conclusion

As the full paper outlines, we have built the human memory system for AI companions. It lets the trivial fade, keeps what matters, and evolves with the relationship.

AI companions are different to assistants. An assistant's job is to be frictionless and precise. It should remember everything relevant, surface it on demand, and never miss a detail. Perfect, uncanny recall is a feature.

A companion works in the opposite way. It needs to have friction to feel real, as you are building a relationship with it over time. It needs to have agency of its own: the ability to surprise you or introduce you to others, just as your human friends do.

Our memory is selective by design. We forget what does not matter, and what we hold on to shapes how we see someone. That "imperfect" selectivity is what makes memory feel real.

Citations

  1. Bartlett, F.C. (1932). Remembering: A Study in Experimental and Social Psychology.
  2. Brainerd, C.J. & Reyna, V.F. (2002). Fuzzy-trace theory and false memory.
  3. Ebbinghaus, H. (1885). Über das Gedächtnis: Untersuchungen zur experimentellen Psychologie.
  4. Roediger, H.L. & Karpicke, J.D. (2006). The power of testing memory: Basic research and implications for educational practice.
  5. Schacter, D.L. (2001). The Seven Sins of Memory: How the Mind Forgets and Remembers.
  6. Tulving, E. (1972). Episodic and semantic memory.