Persistent Memory for AI Agents: From Stateless to Stateful in 50 Lines

The stateless agent problem

An LLM API call is stateless by design. Every time you call chat.completions.create(), the model starts fresh. That's efficient, but it means any agent you build on top of it is also stateless unless you explicitly manage memory.

The naive fix is to keep the full conversation history in the context window. This works for short sessions but breaks for long-running agents: context windows fill up, token costs multiply, and you're paying to re-read months of history on every query. The better fix is a persistent memory layer that retrieves only what's relevant, right now.

This tutorial builds exactly that — a simple chat agent, then the same agent with Feather DB persistent memory. Every code block runs as-is after pip install feather-db openai.

Baseline: the stateless agent

Here's the simplest possible agent loop. It keeps a rolling window of the last 10 messages, but forgets everything when the process exits.

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

history = [
    {"role": "system", "content": "You are a helpful assistant."}
]

def chat(user_message: str) -> str:
    history.append({"role": "user", "content": user_message})
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=history[-10:]  # rolling window only
    )
    reply = response.choices[0].message.content
    history.append({"role": "assistant", "content": reply})
    return reply

# Works fine within a session. Forgets everything on restart.
print(chat("My name is Ashwath and I prefer concise answers."))
print(chat("What's my name?"))  # Works
# Restart the process...
print(chat("What's my name?"))  # Fails — no memory

This agent is fine for one-shot interactions. For anything that runs across sessions — a personal assistant, a support agent, a coding assistant — it's broken by design.

Step 1: initialize the memory store

Add Feather DB as the persistent memory layer. One file on disk, opened on startup, persists across restarts.

import feather_db as fdb
import numpy as np

# Open (or create) the persistent store
db = fdb.DB.open("agent_memory.feather", dim=1536)  # 1536 = OpenAI text-embedding-3-small

def embed(text: str) -> np.ndarray:
    """Get an embedding vector for a piece of text."""
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return np.array(response.data[0].embedding, dtype=np.float32)

The .feather file persists the HNSW index, all metadata, and the context graph across restarts. DB.open() reconstructs the in-memory index from the file in milliseconds.

Step 2: write memories with add()

Every meaningful exchange gets stored as a memory node. Use importance to weight high-signal facts.

import time

def store_memory(text: str, importance: float = 0.5, source: str = "chat") -> int:
    """Store a piece of information in persistent memory."""
    vec = embed(text)
    meta = fdb.Metadata(importance=importance)
    meta.set_attribute("text", text)
    meta.set_attribute("source", source)
    meta.set_attribute("timestamp", str(time.time()))

    node_id = int(time.time() * 1000) % (2**31)  # stable int ID
    db.add(id=node_id, vec=vec, meta=meta)
    return node_id

# Store a high-importance preference
store_memory("User's name is Ashwath", importance=0.9, source="identity")
store_memory("User prefers concise answers, no filler phrases", importance=0.8, source="preference")
store_memory("User is building an AI agent for customer support", importance=0.7, source="context")

Step 3: query with context_chain()

context_chain() is the key API: it combines HNSW nearest-neighbor search with n-hop BFS traversal of the context graph. You get not just the closest memory, but its connected neighbors — the full context cluster around a retrieval hit.

def recall_memory(query: str, k: int = 5, hops: int = 2) -> list[dict]:
    """Retrieve relevant memories using adaptive scoring + graph traversal."""
    query_vec = embed(query)
    chain = db.context_chain(
        query_vec,
        k=k,
        hops=hops,
        half_life=30,    # memories decay over 30 days if not recalled
        time_weight=0.3  # 30% recency, 70% semantic similarity
    )
    memories = []
    for node in chain:
        text = node.meta.get_attribute("text") if node.meta else ""
        if text:
            memories.append({
                "text": text,
                "score": node.score,
                "importance": node.meta.importance if node.meta else 0.5
            })
    return memories

Each call to context_chain() also increments the recall count for retrieved nodes, making frequently-accessed memories stickier over time — they resist decay and keep surfacing.

Step 4: the write-back pattern

A stateful agent does two things on every turn: it reads relevant memory before generating a response, and it writes a summary of new information after. This write-back pattern is what makes the agent accumulate knowledge.

def stateful_chat(user_message: str, session_id: str = "default") -> str:
    # 1. Retrieve relevant memories
    memories = recall_memory(user_message, k=5)
    memory_context = "\n".join(
        f"- {m['text']} (relevance: {m['score']:.2f})" for m in memories
    )

    # 2. Build prompt with memory context injected
    messages = [
        {"role": "system", "content": (
            "You are a helpful assistant with persistent memory. "
            "Relevant memories from past sessions:\n" + memory_context
        )},
        {"role": "user", "content": user_message}
    ]

    # 3. Generate response
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )
    reply = response.choices[0].message.content

    # 4. Write back: store the new exchange as a memory
    exchange_text = f"User asked: {user_message[:100]}. Agent answered: {reply[:200]}"
    node_id = store_memory(exchange_text, importance=0.4, source=f"session:{session_id}")

    # 5. Link to recent memories if relevant
    if memories:
        most_relevant_id = memories[0].get("id")
        if most_relevant_id:
            db.link(node_id, most_relevant_id, edge_type="same_session")

    return reply

Step 5: multi-session continuity

The power of this pattern is that the agent remembers across completely separate processes. Here's a test that proves it.

# session_a.py — run this first, then exit
db = fdb.DB.open("agent_memory.feather", dim=1536)
store_memory("User is building a support agent for an e-commerce startup", importance=0.85)
store_memory("User's tech stack: FastAPI, PostgreSQL, Redis", importance=0.75)
print("Session A complete. Memories written to disk.")

# session_b.py — run this separately, in a new process
db = fdb.DB.open("agent_memory.feather", dim=1536)  # reconstructs from file
reply = stateful_chat("What database should I use for the support agent's memory?")
print(reply)
# The agent knows the tech stack from session A and answers accordingly.

The .feather file is the agent's long-term memory. It persists to disk atomically on every write, survives process restarts, and can be backed up like any file.

The full 50-line version

import os, time
import numpy as np
import feather_db as fdb
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
db = fdb.DB.open("memory.feather", dim=1536)

def embed(text):
    r = client.embeddings.create(model="text-embedding-3-small", input=text)
    return np.array(r.data[0].embedding, dtype=np.float32)

def remember(text, importance=0.5):
    nid = int(time.time() * 1000) % (2**31)
    meta = fdb.Metadata(importance=importance)
    meta.set_attribute("text", text)
    db.add(id=nid, vec=embed(text), meta=meta)
    return nid

def recall(query, k=5):
    chain = db.context_chain(embed(query), k=k, hops=2, half_life=30, time_weight=0.3)
    return [n.meta.get_attribute("text") for n in chain if n.meta]

def chat(msg):
    mems = recall(msg)
    ctx = "\n".join(f"- {m}" for m in mems)
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"Memory:\n{ctx}"},
            {"role": "user", "content": msg}
        ]
    ).choices[0].message.content
    remember(f"Q: {msg[:80]} A: {resp[:150]}", importance=0.4)
    return resp

while True:
    print(chat(input("You: ")))

That's it. A fully persistent, adaptive-memory agent in 50 lines. The memories accumulate across restarts, frequently-recalled facts grow stickier, and the agent's effective knowledge compounds over time.

Install: pip install feather-db openai · GitHub: github.com/feather-store/feather