Benchmarked · Reproducible · Open

Beats GPT-4o full-context.

Feather DB v0.8.0 + GPT-4o scores 0.693 on LongMemEval_S, beating the paper's full-context GPT-4o ceiling at 0.640. Cheap-tier with Gemini-Flash hits 0.657 for ~2.40 per full benchmark run.

0.693
LongMemEval_S
GPT-4o answerer
0.657
LongMemEval_S
Gemini-Flash · ~2.40/run
0.19ms
p50 latency
500K × 128-dim · ef=50
0.972
recall@10
SIFT1M · ef=50

LongMemEval_S leaderboard

500 questions · ~115K tokens haystack each · 5-axis scoring
Naive vector RAGpaper baseline
0.310
Full-context GPT-4o-minipaper
0.554
Full-context GPT-4opaper · the bar to beat
0.640
Feather DB + Gemini-Flashcheap-tier · ~2.40 per full run
0.657
Feather DB + GPT-4o~8 per full run
0.693
Per-axis breakdown (GPT-4o)
single-session-user 1.000single-session-assistant 0.964preference 0.767knowledge-update 0.714multi-session 0.606temporal 0.477
reproduce locally · ~8 min
pip install feather-db
git clone github.com/feather-store/feather && cd feather
python -m bench run longmemeval --dataset s --limit 0 \
    --embedder openai \
    --answerer-provider gemini --answerer-model gemini-2.5-flash \
    --decay-half-life 14 --decay-time-weight 0.4 --k 10