AI Memory Systems in 2026: Feather DB, Mem0, Zep, Letta Compared

Why memory architecture decisions are now load-bearing

In 2023 and early 2024, most AI memory decisions were exploratory. Teams bolted on a vector store, stored conversation turns, and called it done. The field moved fast enough that architectural debt was rarely visible before the pivot.

That era is over. By mid-2026, long-running agents are the dominant product form factor. Customer support agents accumulate twelve months of customer history. Personal assistants span years. Coding assistants carry months of project context across hundreds of sessions. In these use cases, memory architecture is as foundational as your database choice for an API: get it wrong and performance degrades, costs compound, and migration is painful.

Four systems have emerged as the primary options: Feather DB, Mem0, Zep, and Letta (formerly MemGPT's archival memory layer). They make fundamentally different architectural bets, price differently, and impose different operational burdens. This post maps them precisely so you can make an informed choice rather than default to whichever one had the blog post you read last.

The four systems, plainly described

Feather DB

Feather DB is an embedded vector database written in C++17 with Python and Rust bindings. It ships as a single binary .feather file — no server process, no infrastructure. It runs in-process alongside your code.

Memory retrieval in Feather DB is not pure cosine similarity. It uses an adaptive scoring formula:

stickiness    = 1 + log(1 + recall_count)
effective_age = age_in_days / stickiness
recency       = 0.5 ^ (effective_age / half_life_days)
final_score   = ((1 - time_weight) * similarity + time_weight * recency) * importance

Frequently recalled memories resist aging — they become "sticky." More important memories (set at ingest time or derived from usage) rank higher regardless of recency. The result is a retrieval system that behaves like a cognitively plausible memory layer rather than a similarity lookup.

On top of that, Feather DB maintains a typed knowledge graph. Every memory node can have outgoing edges (supports, contradicts, refines, leads_to, same_session, supersedes). The context_chain() API starts from ANN results and then traverses n hops through edges to surface related context that pure vector similarity would miss.

The search stack is HNSW for approximate nearest neighbor, BM25 for exact-term lexical matching, fused via Reciprocal Rank Fusion. p50 retrieval: 0.19 ms on a 500K-node index. MIT-licensed. pip install feather-db.

Mem0

Mem0 is a managed memory API. You post messages to it; it uses an LLM under the hood to automatically extract structured facts ("User prefers dark mode", "User's project is called Atlas") and stores them in a cloud index. You query the API with a natural language string and receive ranked memories back.

The core value proposition is zero setup. There is no embedding model to choose, no index to configure, no decay parameters to tune. You authenticate, call the API, and memories appear. The trade-off is that every memory operation is a cloud round trip and the extraction step adds an LLM call cost on top of the API fee.

Pricing runs approximately $20 per month for a typical single-user agent workload, scaling with memory operations volume. There is no self-hosted option — the extraction pipeline and storage are fully cloud-managed.

Zep

Zep is a conversation memory and vector search layer with both hosted and self-hosted options. It maintains conversation session history, runs NLP extraction over that history to surface entities and facts, and exposes a vector search API over the extracted data.

The architecture sits between Mem0's managed simplicity and Feather DB's embedded control. You can deploy Zep yourself (Python or Go) or use their hosted service. The NLP extraction layer is configurable; you can adjust what entity types it extracts and how it chunks conversation history.

Latency is variable depending on deployment. Self-hosted on the same machine as your application is faster than hosted-tier API calls, but you take on the operational overhead. Hybrid BM25 + dense search is available on newer versions. Decay and adaptive scoring are less sophisticated than Feather DB's stickiness model.

Letta (ArchivalMemory)

Letta is the production packaging of the MemGPT research project. MemGPT introduced a two-tier memory model for LLMs: a core memory (always in the context window, bounded in size) and an archival memory (external, retrieved on demand). Letta implements this architecture for production agents.

The archival memory layer is backed by vector search. The agent uses tool calls to explicitly read from and write to archival memory during a conversation — the memory access is part of the agent's reasoning process rather than a retrieval step that happens before the agent runs. This makes Letta's memory more transparent and auditable than systems where retrieval is a preprocessing step, but it also means the agent must learn to use memory well, and complex retrieval logic ends up inside the agent's reasoning loop.

Letta is self-hosted (Python), MIT-licensed, and powerful but operationally complex. It is the right choice for agent-native memory where you want the agent itself to reason explicitly about what it knows and doesn't know.

Comparison table

Dimension	Feather DB	Mem0	Zep	Letta (ArchivalMemory)
Architecture	Embedded C++ library, in-process	Managed cloud API	Hosted or self-hosted service	Agent-native, MemGPT-inspired
Cost model	Free (MIT OSS)	~$20/month typical; per-call API	Self-host: free + ops; hosted: subscription	Free (MIT OSS); compute cost for self-host
Retrieval latency	0.19 ms p50	50–200 ms (API round trip)	Variable; 20–150 ms typical	Variable; depends on agent loop depth
Adaptive decay	Yes — stickiness + half-life + importance	No	No (recency ordering only)	No
Offline capable	Yes — fully in-process, no network required	No — cloud-only	Yes (self-hosted only)	Yes (self-hosted)
MCP support	Yes — native MCP server	No	No	No
Graph edges	Yes — typed weighted edges, n-hop BFS traversal	No	No (entity extraction, no edge graph)	No
Hybrid BM25 + dense	Yes — RRF fusion	No — dense only	Partial (newer versions)	No — dense only
Auto memory extraction	No (manual or bring-your-own extractor; v0.9.0 will add `feather_db.extractors`)	Yes — LLM-based extraction on every write	Yes — NLP extraction layer	Yes — agent reasons about what to store
Multi-tenant	Yes — namespace + entity system	Yes — user_id scoping	Yes — session scoping	Partial — per-agent isolation
License	MIT	Proprietary (managed)	Apache 2.0	MIT
Language	C++ core, Python + Rust bindings	REST API (language-agnostic)	Python, Go	Python

When to use each system

Use Feather DB when:

Latency is a product constraint. 0.19 ms retrieval means memory lookups are invisible in a user-facing agent. If you're building a real-time assistant, coding agent, or anything where perceived responsiveness matters, the difference between 0.19 ms and 100 ms is the difference between "seamless" and "noticeably paused."
Cost is a margin constraint. At $0/month for the database itself, Feather DB makes agent memory economically viable at high volume. The only cost is compute (negligible for in-process) and whatever embedding API you use. At 100,000 queries per month, the gap between free and $20/month compounds across your user base.
You need offline or edge deployment. Feather DB runs in-process with no network dependency. An agent running on a user's laptop, a mobile device, or an air-gapped server can have full memory capability without a cloud dependency.
You're integrating with Claude or any MCP-compatible tool host. Feather DB ships a native MCP server. Your memory layer is immediately available as a tool to any agent that speaks MCP without any adapter layer.
Memory relationships carry information. If facts in your system are related ("the bug in the payment handler" should surface "the fix in PR #88"), the typed graph edges and context_chain() traversal are not available in any other system on this list.
Recall quality matters and you want to control how it degrades. The adaptive stickiness model means a fact retrieved 50 times over three months will outrank a fact retrieved once yesterday — which is the right behavior for most agent memory scenarios. No other system on this list implements this.

Use Mem0 when:

Zero-config is the priority. If you're prototyping, or if your team does not have backend infrastructure bandwidth, Mem0 gets you working memory in an afternoon. No embedding model selection, no index tuning, no server to maintain.
Auto memory extraction is valuable. Mem0's LLM-based extraction automatically structures unstructured conversation into atomic facts. This is genuinely useful if your data is freeform and you don't want to write extraction logic. The trade-off is cost (an LLM call per write) and latency.
You will not run at high volume. $20/month is a fair price for a low-volume prototype or early-stage product. It becomes expensive at scale — and there is no self-hosted escape valve.

Use Zep when:

You want session-structured conversation history. Zep's data model is built around sessions and conversations. If your agent is session-aware and you want conversation-level memory retrieval, Zep's model fits well.
You want managed ops with a self-hosted option. Zep lets you start on their hosted tier and migrate to self-hosted later — a path that Mem0 does not offer. This is a useful hedge if you're uncertain about your volume.
Python or Go is your stack. Zep's client libraries are solid for both. If you're not embedding in a process, a sidecar Zep service is less overhead than running a full Letta agent framework.

Use Letta when:

Memory reasoning should be part of the agent's cognitive loop. Letta's ArchivalMemory design makes memory access explicit: the agent decides when to recall and what to store. This is the right model when memory management is a first-class capability you want to expose or audit — research agents, autonomous systems, agents that need to reason about the limits of their own knowledge.
You are building a MemGPT-style multi-persona or multi-agent system. Letta's framework is designed for this. The core memory / archival memory two-tier split maps cleanly onto agent context management for complex agentic systems.
Complexity is acceptable in exchange for power. Letta is more complex to operate than any other system on this list. If the architectural benefits are real for your use case, the complexity is worth it. If you're reaching for Letta because it sounds powerful, start with something simpler.

A note on auto memory extraction

Mem0 and Zep both offer automatic extraction — you feed them raw conversation and they pull out facts. Letta's agent reasons explicitly about what to memorize. Feather DB today requires you to provide the text to store (manual extraction, or use your own LLM to extract facts before calling db.add()).

This gap is closing: Feather DB v0.9.0 will ship feather_db.extractors — a pluggable LLM-based extraction layer that pulls atomic facts from conversation turns, resolves contradictions, and maintains ontology-aware edges automatically. The LongMemEval benchmark results show that the knowledge-update score (0.714) is unchanged across model classes, which is the fingerprint of a retrieval-side gap — and extraction quality is the lever that fixes it.

Until extractors ship, teams who need zero-config extraction and are not at volume might reasonably use Mem0 for the extraction layer while evaluating a migration to Feather DB at scale. The two are not architecturally incompatible if you treat Mem0 as an extraction pipeline and replicate the extracted facts into Feather DB.

The latency gap is not academic

50–200 ms of API latency for a memory retrieval does not sound catastrophic until you reason through the agent loop. A typical ReAct agent makes 3–8 tool calls per user turn. If each tool call includes a memory retrieval — to check what the user said before, to pull relevant context, to verify facts — the latency compounds:

3 memory calls × 100 ms = 300 ms added to every agent turn
8 memory calls × 100 ms = 800 ms added to every agent turn

At 0.19 ms per retrieval, those same 8 calls take 1.5 ms total. The difference is a user experience class boundary: one feels instant, the other feels like a loading state. For consumer-facing agents, this matters enormously. For batch pipelines that run overnight, it does not.

The cost gap compounds

The LongMemEval benchmark demonstrated that Feather DB + Gemini Flash delivers 0.657 accuracy at $2.40 per 500-question run. Full-context GPT-4o — the previous ceiling — delivers 0.640 at many times the cost. Feather DB + GPT-4o delivers 0.693 at roughly $8 per run.

That cost structure carries through to production. At 100,000 memory queries per month:

System	Retrieval cost/month (100K queries)	Accuracy ceiling (LongMemEval)
Feather DB + Gemini Flash	~$0 retrieval + embedding cost (~$0.50)	0.657
Feather DB + GPT-4o	~$0 retrieval + embedding cost (~$2)	0.693
Full-context GPT-4o	~$31,000 (115K tokens × 100K queries)	0.640
Mem0 (estimated)	$20–$200+ depending on tier and volume	Not benchmarked publicly

The better retrieval also beats the more expensive full-context approach on accuracy. This is not a cost-quality trade-off — it is a cost-quality improvement. Adaptive scoring, stickiness, and graph traversal produce better signal than dumping everything into a context window and hoping the model attends to the right parts.

Choosing the right layer for your stack

Most production agent stacks will end up with a layered memory architecture rather than a single system:

Short-term (conversation buffer): last N turns in the context window directly — no retrieval needed
Medium-term (session memory): facts from the current session, retrieved at turn boundaries
Long-term (persistent memory): facts accumulated across sessions, retrieved at session start and on demand

For long-term memory, the four systems on this list are the primary options. The decision criteria reduce to:

Do you need retrieval sub-millisecond? → Feather DB
Do you need offline or no cloud dependency? → Feather DB or Letta
Do you need zero-config extraction with no infra? → Mem0
Do you need the agent to reason about memory explicitly? → Letta
Do you need session-structured history with a self-host option? → Zep
Do you need MCP-native tool integration? → Feather DB

If none of the constraints push you toward a specific system, default to Feather DB for its latency, cost, and MIT license. The cost of being wrong is low — migration from Feather DB to any other system is straightforward because the database is a portable file and the API surface is small.

What to watch in the next two quarters

The AI memory space is moving fast in mid-2026. Three developments are worth watching:

Feather DB v0.9.0 extractors. Pluggable LLM-based fact extraction, contradiction resolution, and ontology-aware edges will close the main gap between Feather DB and systems with auto-extraction. Expected to lift LongMemEval knowledge-update scores specifically.

Feather DB Cloud (Q3 2026). The same engine, same file format, hosted API surface. For teams that want Feather DB's performance and accuracy without the embedded deployment model. The .feather file stays portable — the topology changes, the data doesn't.

Mem0's graph layer. Mem0 has announced graph relationship support. If it ships with edge traversal and not just entity co-occurrence, it will be a more competitive option for use cases that currently require Feather DB's context_chain().

The right memory system today may not be the right memory system in six months. The architectures that maximize optionality are embedded systems with portable file formats (Feather DB) and systems with self-hosted options (Zep, Letta) — they let you change the deployment topology without migrating your data.

Install Feather DB: pip install feather-db · GitHub: github.com/feather-store/feather · Cloud waitlist: getfeather.store/cloud