Back to Theory
Theory6 min read · June 16, 2026

What Is Adaptive Memory Decay? How Feather DB Forgets

Stale memories cause hallucinations. Adaptive memory decay — exponential scoring weighted by half-life, importance, and recall count — lets agents forget the right things at the right speed. Here's how Feather DB implements it, and why it beats static storage on LongMemEval.

F
Feather DB
Engineering

Why agents need to forget

Every AI agent that accumulates context over time faces the same quiet failure mode: it stops forgetting. A memory stored six months ago about a user's job title sits at the same retrieval priority as one stored yesterday about their current project. The agent surfaces the stale fact, confidently, in the middle of a response that needed the fresh one.

This isn't a retrieval bug. It's a design assumption — that all memories are equally valid regardless of age. Static vector databases treat retrieval as a similarity problem only. The question is what matches the query. The question that actually matters is what matches the query and is still true.

Adaptive memory decay is the mechanism that answers the second question. It doesn't delete old memories. It progressively reduces their retrieval score based on age, so fresher context competes better — and memories that keep proving useful stay sharp regardless of when they were created.

The exponential decay formula

The mathematical core is straightforward. For any memory in Feather DB, recency is scored as a half-life exponential:

recency = e^(-λ × age_days)
# where λ = ln(2) / half_life

Which is equivalent to the half-life form:

recency = 0.5 ** (age_days / half_life)

Both expressions are identical — 0.5^(t/h) is just e^(-ln(2)/h × t). The half-life form is easier to reason about: when age_days == half_life, recency drops to exactly 0.5. When age_days == 2 × half_life, recency drops to 0.25. When the memory was just written, recency is 1.0.

The full scoring formula combines recency with vector similarity and an explicit importance weight:

stickiness    = 1 + log(1 + recall_count)
effective_age = age_days / stickiness
recency       = 0.5 ** (effective_age / half_life)
score         = ((1 - tw) * similarity + tw * recency) * importance

The tw parameter (time weight, default 0.3) controls how much recency competes with vector similarity. At tw=0, this is pure ANN search — identical to any static vector database. At the default 0.3, recency contributes 30% of the pre-importance score. Similarity still dominates, but a 90-day-old memory takes a meaningful hit.

Half-life by memory type

The most important design decision is choosing the right half-life for each memory type. There is no universal answer — the correct half-life depends entirely on how quickly the information goes stale in your domain.

Memory typehalf_lifeRecency at 30 daysRecency at 90 days
Session context (what we discussed today)1 day~0.000~0.000
Short-term facts (current project, job title)7 days0.031~0.000
Medium-term preferences (communication style)30 days0.5000.125
Long-term knowledge (domain expertise)90 days0.7940.500
Permanent facts (name, birthday)∞ (no decay)1.0001.000

A session-level fact with half_life=1 has a recency of essentially zero by day 3. A permanent fact with no decay (set time_weight=0 for that query or store it with very high importance) never loses retrieval strength. The spectrum between those poles is where most real agent memory lives.

The implication for architecture: don't use a single half_life for your entire memory store. Use namespace isolation or per-query half_life tuning to give different memory categories their own decay rates. Feather DB's namespace system lets you partition memories by type and search each partition with the appropriate half-life.

How Feather DB implements it: half_life and time_weight in db.search()

In practice, decay is applied at query time, not write time. You control it entirely through db.search() parameters:

import feather_db as fdb

db = fdb.DB.open("agent.feather", dim=768)

# Add a memory with explicit importance
db.add(
    id=1,
    vec=embed("User prefers Python over JavaScript for scripting tasks"),
    meta=fdb.Metadata(importance=0.9)
)

# Search with decay: 30-day half-life, 30% weight on recency
results = db.search(query_vec, k=10, half_life=30, time_weight=0.3)

The decay calculation happens on the candidate set returned by HNSW approximate nearest neighbor search. Feather DB retrieves a wider candidate pool from the HNSW index, then reranks by the full adaptive scoring formula before returning the top-k results. This means decay doesn't change what the ANN index searches — it changes how results are ranked after retrieval.

To effectively disable decay for a query (for lookups where freshness doesn't matter, such as permanent user attributes), set time_weight=0:

# Pure similarity search — decay disabled
results = db.search(query_vec, k=10, time_weight=0)

Recall strengthening: the adaptive loop

Decay alone would be too aggressive. A memory about a user's core working style might be 60 days old, but it gets retrieved and used in nearly every session. Without some counter-mechanism, it would decay into irrelevance even though it's clearly valuable.

Feather DB counters this with recall strengthening. Every time a memory is returned in a search result and marked as used, its recall_count increments. The stickiness term in the scoring formula uses this count to compress the effective age:

stickiness    = 1 + log(1 + recall_count)
effective_age = age_days / stickiness

A memory recalled 10 times has a stickiness of 2.4. At day 90, its effective age is 37.5 days instead of 90. Its recency score (with half_life=30) is 0.42 instead of 0.125 — more than 3× higher just from being useful.

This creates a self-organizing memory system:

  • Retrieve → use → strengthen. Memories that keep getting surfaced and used accumulate recall counts, reducing their effective age. They stay competitive in search results even as they get old.
  • Ignore → weaken → fade. Memories that are never retrieved accumulate no stickiness. Their recency score decays on the natural half-life curve until they stop appearing in results.

The logarithmic form of stickiness — 1 + log(1 + recall_count) — is deliberate. It prevents runaway stickiness for memories that happen to be retrieved constantly. The gain flattens after recall_count ≈ 50, so no memory becomes permanently immune to decay regardless of how often it's been recalled.

The recall loop closes with update_recall():

results = db.search(query_vec, k=5, half_life=30, time_weight=0.3)

for r in results:
    # Use r.text in your prompt
    pass

# Strengthen the memories you actually used
db.update_recall([r.id for r in results])

Importance weighting: initial priority

Decay and recall stickiness handle the time dimension. Importance handles the structural signal that time alone can't capture: some memories matter more than others from the moment they're written.

metadata.importance accepts a float in the range 0.0–1.0 (with values above 1.0 supported for amplification). It multiplies the entire combined score, so an importance=0.9 memory scores 90% as strongly as an importance=1.0 memory at identical similarity and recency:

import feather_db as fdb

db = fdb.DB.open("agent.feather", dim=768)

# High importance: explicitly confirmed user preference
db.add(id=1, vec=embed("User prefers Python"), meta=fdb.Metadata(importance=0.9))

# Medium importance: inferred from behavior
db.add(id=2, vec=embed("User tends to use bullet points"), meta=fdb.Metadata(importance=0.5))

# Low importance: speculative, uncertain
db.add(id=3, vec=embed("User might be in APAC timezone"), meta=fdb.Metadata(importance=0.2))

Use importance to encode your confidence in a memory's correctness, not just its relevance. A fact the user stated directly deserves high importance. A hypothesis inferred from behavioral signals deserves lower importance. An explicitly corrected fact — something the user said was wrong — should be added with very low importance or not stored at all.

LongMemEval results: why decay beats static storage

The abstract argument for adaptive decay is intuitive. The empirical argument is what Feather DB reports on LongMemEval_S (April 2026 draft): a score of 0.693 with GPT-4o as the answerer, compared to 0.640 for full-context GPT-4o on the same benchmark.

That 0.640 number is the static baseline — every conversation turn fed into the context window, no decay, no prioritization. Feather DB's decay-weighted retrieval scores 0.693 despite never seeing the full context. The delta comes from what gets surfaced: temporally-weighted retrieval consistently promotes more recent, more frequently recalled information. Full-context approaches weight all history equally, which means stale facts compete with fresh ones for the answerer's attention at inference time.

LongMemEval tests exactly the scenarios where this matters: questions that require the agent to use recent information and ignore older contradictory information. Static storage fails these by surfacing both. Decay-weighted scoring fails them less often because the older contradictory memory has a lower effective score by the time the question arrives.

Feather DB's temporal reasoning score on LongMemEval sub-categories is 0.417–0.477 — the honest weak spot. The benchmark's hardest temporal tasks (explicit date arithmetic, multi-hop temporal chains) still require reasoning capabilities beyond retrieval scoring. Decay improves the information available to the answerer; it doesn't replace the answerer's temporal reasoning. That gap is where active development is focused.

Putting it together: a minimal agent memory loop

The full adaptive memory loop in practice is about 15 lines:

import feather_db as fdb

db = fdb.DB.open("agent.feather", dim=768)

def remember(text: str, importance: float = 0.7) -> int:
    vec = embed(text)
    node = db.add(vec=vec, meta=fdb.Metadata(importance=importance))
    return node.id

def recall(query: str, half_life: int = 30) -> list:
    vec = embed(query)
    results = db.search(vec, k=10, half_life=half_life, time_weight=0.3)
    db.update_recall([r.id for r in results])
    return results

# Write memories with different decay rates via importance
remember("User's name is Alex", importance=1.0)
remember("User is currently migrating from Flask to FastAPI", importance=0.8)
remember("User mentioned they like dark mode", importance=0.4)

# Retrieve with 30-day half-life: recent context dominates
memories = recall("What stack is the user working with?", half_life=30)

The loop is: write memories at the importance level that reflects your confidence, recall with the half-life that matches how fast your domain changes, and let stickiness handle the rest. Memories that keep proving useful stay sharp. Memories that don't, fade.

Install

pip install feather-db

GitHub: github.com/feather-store/feather