Feather DB + Gemini: Give Google's AI Agents Persistent Memory
Gemini's API is stateless — every call starts cold. Feather DB fixes that. Here's how to build a Gemini chatbot that actually remembers across sessions, with typed memory graphs, adaptive decay, and fast cold load on Cloud Run.
Deploy · Feather DB v0.16.0 · June 2026
The problem in one sentence
Every Gemini API call starts with a blank slate. You send a prompt, you get a response, the context is gone. There is no memory between sessions unless you build it yourself.
This tutorial shows you exactly how to build it — a Gemini chatbot backed by Feather DB that remembers user preferences, past answers, and session history across every conversation, forever.
The full working example is at the bottom. Read the explanation first — the design decisions matter.
Why stateless is the default
Gemini's generateContent endpoint is HTTP: you send a payload, you get a payload back. Google doesn't store your conversation. The contents array you pass in is the only context the model sees.
import google.generativeai as genai
model = genai.GenerativeModel("gemini-2.0-flash")
# Every call is independent. The model has no idea what happened before.
response = model.generate_content("What was my question last week?")
# → "I don't have access to previous conversations."
The standard fix is to pass the full conversation history in each request. That works for a single session. It breaks when the session ends, when the context window fills up (1M tokens sounds big until you have a real user), or when you need to surface a preference the user mentioned three weeks ago.
Feather DB replaces full-context stuffing with semantic retrieval. Instead of sending everything, you retrieve the five most relevant past exchanges and send those. 40× cheaper per query. Actually scales.
The embedding model: gemini-embedding-exp-03-07
For this to work, the embedding model and Feather DB's index dimension must match. Feather DB defaults to 768 dimensions. gemini-embedding-exp-03-07 (Gemini Embedding 2) outputs 768-dimensional vectors. Zero configuration needed.
import google.generativeai as genai
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
result = genai.embed_content(
model="models/gemini-embedding-exp-03-07",
content="The user prefers concise answers and hates bullet lists.",
task_type="RETRIEVAL_DOCUMENT",
)
# result["embedding"] → list of 768 floats. Matches Feather DB dim=768 exactly.
One practical note: use task_type="RETRIEVAL_DOCUMENT" when storing, and task_type="RETRIEVAL_QUERY" when searching. Gemini Embedding 2 is asymmetric — it optimizes each direction separately, which improves recall.
And since Gemini Embedding 2 is multimodal, the same 768-dim space works for images. If you later want to store screenshots, UI flows, or images the user shares, you embed those into the same index with no dimension change. Everything is comparable.
Two half-life values, two memory types
Not all memories age at the same rate. A user's preference for short answers should persist for months. The exact wording of a question they asked yesterday probably doesn't matter next week.
Feather DB's adaptive decay formula handles this:
stickiness = 1 + log(1 + recall_count)
effective_age = age_in_days / stickiness
recency = 0.5 ^ (effective_age / half_life_days)
final_score = ((1 - time_weight) × similarity + time_weight × recency) × importance
The half_life parameter controls how fast a memory fades. In this chatbot, we use two values:
half_life=90for long-term user preferences — things the user stated explicitly ("I prefer Python over JavaScript", "never use markdown tables"). These should survive for months.half_life=7for recent facts — specific answers, session context, things that were relevant this week but probably not next month.
Both live in the same .feather file. You set half_life per search call, not per node — so the same memory can age slowly when queried in a preference context and faster when queried for recency.
Typed graph edges: linking related memories
Feather DB supports typed, weighted, directional edges between nodes. In a chatbot context, this means you can link memories that belong together:
same_session— memories from the same conversation turnsame_topic— memories about the same subject (e.g., two exchanges about API authentication)
When you retrieve a memory, you can traverse its edges to pull adjacent context. If a user asks about Feather DB's pricing, you retrieve the pricing memory and walk same_topic edges to pull in any related memories about their evaluation criteria, all in one graph traversal.
# Link the user message and assistant response from the same turn
db.link(from_id=user_msg_id, to_id=asst_msg_id, rel_type="same_session", weight=1.0)
# Link this exchange to a previous exchange on the same topic
if related_id:
db.link(from_id=user_msg_id, to_id=related_id, rel_type="same_topic", weight=0.7)
v0.16.0: fast cold load on Cloud Run
If you deploy on Cloud Run, your container starts cold on every new instance. Loading a large .feather file into memory on cold start was the main latency source in earlier versions.
v0.16.0 ships a lazy-load path: the HNSW index is memory-mapped on open, and the full graph is only deserialized on first query. For most Cloud Run deployments, cold start is now under 200ms even with 100K+ nodes in the index. You mount the .feather file from a Cloud Storage bucket or a persistent volume — no changes to your application code.
import feather_db
# v0.16.0: opens immediately, deserializes lazily on first search()
db = feather_db.DB.open("memory.feather", dim=768)
Complete example: Gemini chatbot with Feather DB memory
This is a full, working chatbot. It stores every turn, retrieves relevant past context, links related memories, and persists across sessions in a single memory.feather file.
pip install feather-db google-generativeai
"""
gemini_memory_chat.py
A Gemini chatbot with persistent cross-session memory via Feather DB.
Run it multiple times — it remembers every conversation.
Requirements:
pip install feather-db google-generativeai
export GOOGLE_API_KEY=your_key_here
"""
import os
import time
import uuid
import feather_db
import google.generativeai as genai
# ── Config ────────────────────────────────────────────────────────────────────
MEMORY_FILE = "memory.feather"
GEMINI_MODEL = "gemini-2.0-flash"
EMBED_MODEL = "models/gemini-embedding-exp-03-07"
DIM = 768 # gemini-embedding-exp-03-07 default output dim
RETRIEVAL_K = 5 # how many past memories to surface per query
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
# ── Helpers ───────────────────────────────────────────────────────────────────
def embed(text: str, task: str = "RETRIEVAL_DOCUMENT") -> list[float]:
"""Embed text using Gemini Embedding 2 (768-dim, multimodal-capable)."""
result = genai.embed_content(
model=EMBED_MODEL,
content=text,
task_type=task,
)
return result["embedding"]
def numeric_id() -> int:
"""Generate a unique positive integer ID from a UUID."""
return uuid.uuid4().int >> 64 # top 64 bits → fits in int64
def store_turn(
db: feather_db.DB,
user_text: str,
assistant_text: str,
session_id: str,
is_preference: bool = False,
) -> tuple[int, int]:
"""
Store one conversation turn as two linked nodes.
- user message → half_life determined by is_preference
- assistant text → linked via same_session edge
Returns (user_id, assistant_id).
"""
timestamp = int(time.time())
# Embed both sides
user_vec = embed(user_text, task="RETRIEVAL_DOCUMENT")
asst_vec = embed(assistant_text, task="RETRIEVAL_DOCUMENT")
# Node IDs
user_id = numeric_id()
asst_id = numeric_id()
# Importance: preference memories are more important
importance = 0.9 if is_preference else 0.6
# User message node
user_meta = feather_db.Metadata()
user_meta.importance = importance
user_meta.set_attribute("role", "user")
user_meta.set_attribute("text", user_text[:512]) # store preview
user_meta.set_attribute("session_id", session_id)
user_meta.set_attribute("timestamp", str(timestamp))
user_meta.set_attribute("memory_type", "preference" if is_preference else "fact")
db.add(id=user_id, vec=user_vec, meta=user_meta)
# Assistant response node
asst_meta = feather_db.Metadata()
asst_meta.importance = importance * 0.85 # slightly lower — it's the answer, not the fact
asst_meta.set_attribute("role", "assistant")
asst_meta.set_attribute("text", assistant_text[:512])
asst_meta.set_attribute("session_id", session_id)
asst_meta.set_attribute("timestamp", str(timestamp))
asst_meta.set_attribute("memory_type", "preference" if is_preference else "fact")
db.add(id=asst_id, vec=asst_vec, meta=asst_meta)
# Link user ↔ assistant from same turn
db.link(from_id=user_id, to_id=asst_id, rel_type="same_session", weight=1.0)
db.link(from_id=asst_id, to_id=user_id, rel_type="same_session", weight=1.0)
return user_id, asst_id
def retrieve_context(
db: feather_db.DB,
query: str,
current_session_id: str,
k: int = RETRIEVAL_K,
) -> str:
"""
Retrieve the most relevant past memories for a query.
Uses two searches with different half_life values:
- half_life=90 to surface long-term preferences
- half_life=7 to surface recent facts
Then deduplicates and formats as a context block.
"""
if db.size() == 0:
return ""
query_vec = embed(query, task="RETRIEVAL_QUERY")
# Search 1: long-term preferences (slow decay)
pref_results = db.search(
vec=query_vec,
k=k,
half_life=90,
time_weight=0.2,
)
# Search 2: recent facts (fast decay)
fact_results = db.search(
vec=query_vec,
k=k,
half_life=7,
time_weight=0.4,
)
# Deduplicate by node ID and collect text
seen_ids = set()
memories = []
for result in pref_results + fact_results:
node_id = result.id
if node_id in seen_ids:
continue
seen_ids.add(node_id)
role = result.meta.get_attribute("role") or "unknown"
text = result.meta.get_attribute("text") or ""
mem_type = result.meta.get_attribute("memory_type") or "fact"
sess = result.meta.get_attribute("session_id") or ""
if not text:
continue
# Skip assistant nodes from current session — they're already in context
if role == "assistant" and sess == current_session_id:
continue
label = f"[past {mem_type} — {role}]"
memories.append(f"{label} {text}")
if not memories:
return ""
block = "\n".join(memories[:k])
return f"\n{block}\n "
def is_preference_statement(text: str) -> bool:
"""
Heuristic: mark a turn as a preference if the user expresses a persistent
preference or constraint. In production, ask Gemini to classify this.
"""
keywords = [
"prefer", "always", "never", "don't like", "hate", "love",
"please don't", "instead of", "i want you to", "from now on",
"remember that", "my name is", "i am a", "i work",
]
lower = text.lower()
return any(kw in lower for kw in keywords)
def link_to_topic(
db: feather_db.DB,
new_user_id: int,
query_vec: list[float],
current_session_id: str,
) -> None:
"""
Find the most semantically similar past user message and link it
via a same_topic edge.
"""
if db.size() < 3:
return
results = db.search(vec=query_vec, k=3, half_life=30, time_weight=0.1)
for result in results:
if result.id == new_user_id:
continue
role = result.meta.get_attribute("role") or ""
sess = result.meta.get_attribute("session_id") or ""
if role == "user" and sess != current_session_id:
db.link(from_id=new_user_id, to_id=result.id, rel_type="same_topic", weight=0.7)
break
# ── Main chat loop ─────────────────────────────────────────────────────────────
def chat():
# Open (or create) the persistent memory file
db = feather_db.DB.open(MEMORY_FILE, dim=DIM)
model = genai.GenerativeModel(GEMINI_MODEL)
session_id = str(uuid.uuid4())
history: list[dict] = [] # Gemini in-session history (current session only)
print(f"Gemini + Feather DB Memory Chat")
print(f"Session: {session_id[:8]}")
print(f"Memory nodes loaded: {db.size()}")
print("Type 'quit' to exit.\n")
while True:
user_input = input("You: ").strip()
if not user_input or user_input.lower() in ("quit", "exit"):
break
# 1. Retrieve relevant past context from Feather DB
memory_context = retrieve_context(db, user_input, session_id)
# 2. Build the prompt for Gemini
# Inject memory context as a system-style prefix in the user turn.
if memory_context:
augmented_input = (
f"{memory_context}\n\n"
f"Use the above memory context if relevant. "
f"Do not mention the memory block explicitly unless asked.\n\n"
f"User: {user_input}"
)
else:
augmented_input = user_input
# Add to Gemini's in-session history
history.append({"role": "user", "parts": [augmented_input]})
# 3. Call Gemini
response = model.generate_content(history)
assistant_text = response.text.strip()
history.append({"role": "model", "parts": [assistant_text]})
print(f"\nGemini: {assistant_text}\n")
# 4. Store this turn in Feather DB
is_pref = is_preference_statement(user_input)
user_id, asst_id = store_turn(
db,
user_text=user_input,
assistant_text=assistant_text,
session_id=session_id,
is_preference=is_pref,
)
# 5. Link to related past topics via same_topic edge
query_vec = embed(user_input, task="RETRIEVAL_QUERY")
link_to_topic(db, user_id, query_vec, session_id)
# 6. Persist to disk
db.save()
print(f"\nSession ended. Memory nodes saved: {db.size()}")
if __name__ == "__main__":
chat()
What happens across sessions
Run the script twice. On the second run, db.size() is non-zero — Feather DB loaded your memory.feather file from disk. When the user asks something related to a past exchange, retrieve_context() surfaces it and injects it into the prompt. Gemini sees it and responds accordingly. The user never has to repeat themselves.
The memory graph grows over time. Preferences accumulate recall counts (because they surface frequently), which compresses their effective age via the stickiness formula — keeping them near the top of scored results even months later. Recent-fact memories fade naturally after a week or two. You don't need to prune anything manually.
Deploying on Cloud Run
Three things to configure:
- Mount the
.featherfile from a persistent volume (or Cloud Storage via FUSE). Don't bundle it in the container image — it grows with every session. - v0.16.0 lazy load handles cold starts.
DB.open()returns immediately. The firstsearch()call deserializes the graph. - Single-writer constraint: if you scale to multiple instances, use a singleton writer pattern (one Cloud Run instance handles writes, others read from a shared mount) or migrate to Feather Cloud (Q3 2026) which handles multi-writer natively.
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY gemini_memory_chat.py .
# Memory file is mounted at runtime, not baked in
CMD ["python", "gemini_memory_chat.py"]
# cloud-run-service.yaml (abbreviated)
spec:
containers:
- image: gcr.io/your-project/gemini-memory-chat
env:
- name: GOOGLE_API_KEY
valueFrom:
secretKeyRef:
name: google-api-key
key: latest
volumeMounts:
- name: memory-vol
mountPath: /app/memory.feather
volumes:
- name: memory-vol
csi:
driver: gcsfuse.csi.storage.gke.io
volumeAttributes:
bucketName: your-memory-bucket
Extending to multimodal
Because gemini-embedding-exp-03-07 produces 768-dim vectors for both text and images, you can store image memories in the same Feather DB index with no extra configuration. If a user shares an image, embed it and store it alongside your text memories — all in the same db.search() call.
import base64
def embed_image(image_bytes: bytes, caption: str = "") -> list[float]:
"""Embed an image (+ optional caption) into the shared 768-dim space."""
content = [{"mime_type": "image/jpeg", "data": base64.b64encode(image_bytes).decode()}]
if caption:
content.append(caption)
result = genai.embed_content(
model=EMBED_MODEL,
content=content,
task_type="RETRIEVAL_DOCUMENT",
)
return result["embedding"]
# Store image memory alongside text memories — same index, same dim
img_vec = embed_image(image_bytes, caption="Screenshot of user's dashboard error")
img_id = numeric_id()
img_meta = feather_db.Metadata()
img_meta.importance = 0.75
img_meta.set_attribute("role", "user")
img_meta.set_attribute("text", "Screenshot of dashboard error")
img_meta.set_attribute("modality", "image")
img_meta.set_attribute("session_id", session_id)
db.add(id=img_id, vec=img_vec, meta=img_meta)
db.save()
A text query for "the error I showed you last week" will surface this image node in the search results — because text and image vectors are in the same semantic space.
The numbers that matter
- 40× cheaper per query vs sending full conversation history every call
- 0.19ms p50 ANN latency on 500K vectors — retrieval is not the bottleneck
- 97.2% recall@10 — you're not losing relevant memories to index inaccuracy
- 768 dimensions — exact match between
gemini-embedding-exp-03-07and Feather DB's default index, zero config - One
.featherfile — deploy anywhere, no vector database server to run
What's next
The pattern above is the foundation. From here you can:
- Add a Gemini classification step to detect preference statements more reliably than the keyword heuristic
- Use
context_chain()to traversesame_topicedges for richer multi-hop context retrieval - Store tool call results as memories so the agent doesn't repeat API calls it already made
- Run multiple agents sharing one
.featherfile as a shared memory layer (read-only for all but one writer)
Install Feather DB: pip install feather-db. The memory.feather file is yours — no server, no cloud dependency, no vendor lock-in.