Persistent Memory for AI Agents: From Stateless to Stateful in 50 Lines
Most AI agents forget everything the moment the conversation ends. Here's the exact code pattern to add persistent, adaptive memory to any agent in under 50 lines — using Feather DB.
The stateless agent problem
An LLM API call is stateless by design. Every time you call chat.completions.create(), the model starts fresh. That's efficient, but it means any agent you build on top of it is also stateless unless you explicitly manage memory.
The naive fix is to keep the full conversation history in the context window. This works for short sessions but breaks for long-running agents: context windows fill up, token costs multiply, and you're paying to re-read months of history on every query. The better fix is a persistent memory layer that retrieves only what's relevant, right now.
This tutorial builds exactly that — a simple chat agent, then the same agent with Feather DB persistent memory. Every code block runs as-is after pip install feather-db openai.
Baseline: the stateless agent
Here's the simplest possible agent loop. It keeps a rolling window of the last 10 messages, but forgets everything when the process exits.
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
history = [
{"role": "system", "content": "You are a helpful assistant."}
]
def chat(user_message: str) -> str:
history.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=history[-10:] # rolling window only
)
reply = response.choices[0].message.content
history.append({"role": "assistant", "content": reply})
return reply
# Works fine within a session. Forgets everything on restart.
print(chat("My name is Ashwath and I prefer concise answers."))
print(chat("What's my name?")) # Works
# Restart the process...
print(chat("What's my name?")) # Fails — no memory
This agent is fine for one-shot interactions. For anything that runs across sessions — a personal assistant, a support agent, a coding assistant — it's broken by design.
Step 1: initialize the memory store
Add Feather DB as the persistent memory layer. One file on disk, opened on startup, persists across restarts.
import feather_db as fdb
import numpy as np
# Open (or create) the persistent store
db = fdb.DB.open("agent_memory.feather", dim=1536) # 1536 = OpenAI text-embedding-3-small
def embed(text: str) -> np.ndarray:
"""Get an embedding vector for a piece of text."""
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return np.array(response.data[0].embedding, dtype=np.float32)
The .feather file persists the HNSW index, all metadata, and the context graph across restarts. DB.open() reconstructs the in-memory index from the file in milliseconds.
Step 2: write memories with add()
Every meaningful exchange gets stored as a memory node. Use importance to weight high-signal facts.
import time
def store_memory(text: str, importance: float = 0.5, source: str = "chat") -> int:
"""Store a piece of information in persistent memory."""
vec = embed(text)
meta = fdb.Metadata(importance=importance)
meta.set_attribute("text", text)
meta.set_attribute("source", source)
meta.set_attribute("timestamp", str(time.time()))
node_id = int(time.time() * 1000) % (2**31) # stable int ID
db.add(id=node_id, vec=vec, meta=meta)
return node_id
# Store a high-importance preference
store_memory("User's name is Ashwath", importance=0.9, source="identity")
store_memory("User prefers concise answers, no filler phrases", importance=0.8, source="preference")
store_memory("User is building an AI agent for customer support", importance=0.7, source="context")
Step 3: query with context_chain()
context_chain() is the key API: it combines HNSW nearest-neighbor search with n-hop BFS traversal of the context graph. You get not just the closest memory, but its connected neighbors — the full context cluster around a retrieval hit.
def recall_memory(query: str, k: int = 5, hops: int = 2) -> list[dict]:
"""Retrieve relevant memories using adaptive scoring + graph traversal."""
query_vec = embed(query)
chain = db.context_chain(
query_vec,
k=k,
hops=hops,
half_life=30, # memories decay over 30 days if not recalled
time_weight=0.3 # 30% recency, 70% semantic similarity
)
memories = []
for node in chain:
text = node.meta.get_attribute("text") if node.meta else ""
if text:
memories.append({
"text": text,
"score": node.score,
"importance": node.meta.importance if node.meta else 0.5
})
return memories
Each call to context_chain() also increments the recall count for retrieved nodes, making frequently-accessed memories stickier over time — they resist decay and keep surfacing.
Step 4: the write-back pattern
A stateful agent does two things on every turn: it reads relevant memory before generating a response, and it writes a summary of new information after. This write-back pattern is what makes the agent accumulate knowledge.
def stateful_chat(user_message: str, session_id: str = "default") -> str:
# 1. Retrieve relevant memories
memories = recall_memory(user_message, k=5)
memory_context = "\n".join(
f"- {m['text']} (relevance: {m['score']:.2f})" for m in memories
)
# 2. Build prompt with memory context injected
messages = [
{"role": "system", "content": (
"You are a helpful assistant with persistent memory. "
"Relevant memories from past sessions:\n" + memory_context
)},
{"role": "user", "content": user_message}
]
# 3. Generate response
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
reply = response.choices[0].message.content
# 4. Write back: store the new exchange as a memory
exchange_text = f"User asked: {user_message[:100]}. Agent answered: {reply[:200]}"
node_id = store_memory(exchange_text, importance=0.4, source=f"session:{session_id}")
# 5. Link to recent memories if relevant
if memories:
most_relevant_id = memories[0].get("id")
if most_relevant_id:
db.link(node_id, most_relevant_id, edge_type="same_session")
return reply
Step 5: multi-session continuity
The power of this pattern is that the agent remembers across completely separate processes. Here's a test that proves it.
# session_a.py — run this first, then exit
db = fdb.DB.open("agent_memory.feather", dim=1536)
store_memory("User is building a support agent for an e-commerce startup", importance=0.85)
store_memory("User's tech stack: FastAPI, PostgreSQL, Redis", importance=0.75)
print("Session A complete. Memories written to disk.")
# session_b.py — run this separately, in a new process
db = fdb.DB.open("agent_memory.feather", dim=1536) # reconstructs from file
reply = stateful_chat("What database should I use for the support agent's memory?")
print(reply)
# The agent knows the tech stack from session A and answers accordingly.
The .feather file is the agent's long-term memory. It persists to disk atomically on every write, survives process restarts, and can be backed up like any file.
The full 50-line version
import os, time
import numpy as np
import feather_db as fdb
from openai import OpenAI
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
db = fdb.DB.open("memory.feather", dim=1536)
def embed(text):
r = client.embeddings.create(model="text-embedding-3-small", input=text)
return np.array(r.data[0].embedding, dtype=np.float32)
def remember(text, importance=0.5):
nid = int(time.time() * 1000) % (2**31)
meta = fdb.Metadata(importance=importance)
meta.set_attribute("text", text)
db.add(id=nid, vec=embed(text), meta=meta)
return nid
def recall(query, k=5):
chain = db.context_chain(embed(query), k=k, hops=2, half_life=30, time_weight=0.3)
return [n.meta.get_attribute("text") for n in chain if n.meta]
def chat(msg):
mems = recall(msg)
ctx = "\n".join(f"- {m}" for m in mems)
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": f"Memory:\n{ctx}"},
{"role": "user", "content": msg}
]
).choices[0].message.content
remember(f"Q: {msg[:80]} A: {resp[:150]}", importance=0.4)
return resp
while True:
print(chat(input("You: ")))
That's it. A fully persistent, adaptive-memory agent in 50 lines. The memories accumulate across restarts, frequently-recalled facts grow stickier, and the agent's effective knowledge compounds over time.
Install: pip install feather-db openai · GitHub: github.com/feather-store/feather