Feather DB in Production: Deployment Patterns and Best Practices
Embedded, feather-serve daemon, or remote MCP — Feather DB runs wherever you need it. This guide covers namespace design, file management, compaction, memory optimization, cold-start tuning, bulk ingestion, Docker, monitoring, multi-tenant patterns, and disaster recovery.
Deployment modes
Feather DB ships three deployment modes. Pick one based on where your embedding logic lives and who needs to reach the index.
Embedded (single process)
The default mode. Your Python process opens a .feather file directly — no network hop, no daemon, sub-millisecond search latency. Use this for single-service backends, batch pipelines, and local development.
import feather_db as fdb
db = fdb.DB.open("memory.feather", dim=768)
db.add(vec, text="Alice prefers dark mode.", namespace="user-alice")
results = db.search(query_vec, k=5, namespace="user-alice")
db.save()
The file is created on first open and grown incrementally. You own the process lifecycle — save before exit.
feather-serve (local daemon)
Run feather-serve as a long-lived process. It exposes a REST API at /api/v1/, an MCP endpoint at /mcp, and the Atlas admin SPA at /admin/. Other services call HTTP instead of linking the library directly. Use this when multiple processes share one context store, or when you want the admin SPA for inspection.
GOOGLE_API_KEY=your-key feather-serve memory.feather \
--embed-provider gemini --dim 768 --port 7700
With --embed-provider, feather-serve handles embeddings server-side. Clients send raw text; Feather returns ranked memories. No embedding pipeline in your application code.
Remote MCP (network)
Connect Claude Desktop, Claude Code, or any MCP-compatible agent to a running feather-serve instance. The agent calls feather_search, feather_add, feather_context_chain, and 11 other tools as native MCP calls. Use this for persistent agent memory across conversations.
// claude_desktop_config.json
{
"mcpServers": {
"feather-memory": {
"url": "http://localhost:7700/mcp"
}
}
}
The three modes compose: in production you might run feather-serve in Docker (mode 2), accessed by your Python API via REST (mode 1 semantics over HTTP) and by Claude via MCP (mode 3).
Namespace design
Namespaces are hard tenant boundaries. Memories in "user-alice" are completely invisible to searches in "user-bob". Search latency scales with the number of memories in that namespace, not across all tenants — which is what makes a shared file practical at scale.
The standard pattern:
- namespace = tenant boundary — user_id, agent_id, org_id
- entity = topic group within a namespace — "preferences", "work-context", "current-project"
- attributes = secondary metadata — type, source, created_at, confidence
import feather_db as fdb
from datetime import datetime, timezone
db = fdb.DB.open("saas_memory.feather", dim=768)
def add_memory(user_id: str, text: str, category: str,
mem_type: str, importance: float = 1.0, vec=None):
if vec is None:
vec = embed(text)
mem = db.add(vec, text=text, namespace=user_id, entity=category)
mem.meta.importance = importance
mem.meta.set_attribute("type", mem_type)
mem.meta.set_attribute("created_at", datetime.now(timezone.utc).isoformat())
return mem
def search_memory(user_id: str, query: str,
category: str = None, k: int = 5):
return db.search(embed(query), k=k,
namespace=user_id, entity=category)
# One tenant per user_id — strict isolation, zero cross-contamination
add_memory("user-42", "Prefers Python for backend, TypeScript for frontend.",
category="preferences", mem_type="fact", importance=1.2)
add_memory("user-42", "Building a fintech SaaS, Series A in Q3.",
category="work-context", mem_type="fact", importance=1.5)
results = search_memory("user-42", "What stack does this user prefer?")
Agent roles get their own namespaces too. A planner agent and a coder agent operating on the same codebase should have separate namespaces — "agent-planner" and "agent-coder" — so their context stores don't bleed into each other's retrieval.
File management
Where to store .feather files
Keep .feather files on durable, fast-seek storage — SSD-backed volumes or network-attached storage with high IOPS. The file is read fully at startup (HNSW graph reconstruction) and written atomically on db.save(). Write latency matters at save time; read IOPS matter at startup.
Recommended layout for a production service:
/data/feather/
production.feather # primary store
production.feather.bak # last manual snapshot
staging.feather # staging environment
In Docker, always use a named volume — never a bind mount to a temp directory:
volumes:
feather-data:
driver: local
services:
feather-api:
volumes:
- feather-data:/data
Backup strategy
The .feather file is self-contained: vectors, HNSW graph, metadata, edges, and namespace index are all in one binary. A copy is a backup.
# Snapshot before a risky operation (migration, bulk delete)
cp /data/feather/production.feather \
/data/feather/production.$(date +%Y%m%d-%H%M%S).bak
# Restore is equally simple
cp /data/feather/production.20260616-0900.bak \
/data/feather/production.feather
For scheduled backups, copy the file to object storage (S3, GCS) nightly. Because db.save() writes atomically to a temp file then renames, a concurrent copy during a save will always get a consistent snapshot — either the old file or the new one, never a partial write.
# Nightly backup cron
0 2 * * * cp /data/feather/production.feather \
s3://your-bucket/backups/production.$(date +%Y%m%d).feather
Compaction
Feather's HNSW graph accumulates soft-deleted nodes when you call forget() or purge(). These nodes no longer appear in search results but still occupy space in the graph, slowing traversal slightly. compact() rewrites the file clean — removed nodes gone, HNSW rebuilt tight, load time faster on the next restart.
db.compact() # rebuilds graph in-place, rewrites .feather file
db.save() # flush the compacted state to disk
When to compact:
- After a bulk
forget()orpurge()that removes more than ~10% of nodes - After onboarding data migrations where you replaced old records
- Before a disaster recovery restore — compact the source file first so the restore loads a tight index
- On a weekly schedule for stores with frequent deletes
import schedule, time
def weekly_compact():
db.purge(namespace="*", older_than_days=90) # evict stale memories
db.compact()
db.save()
print(f"Compacted. Vectors remaining: {db.count()}")
schedule.every().monday.at("03:00").do(weekly_compact)
while True:
schedule.run_pending()
time.sleep(60)
Compaction is also the fastest way to reduce file size before shipping a snapshot to a new environment.
Memory management
Adaptive capacity (v0.15.3)
v0.15.3 ships adaptive HNSW capacity: the index grows incrementally instead of pre-allocating max_elements upfront. For typical deployments that start small and grow over weeks, this delivers 7.7× less RAM at startup compared to pre-allocating for 1M elements on an empty index. The change is automatic — no config required.
int8 RAM quantization
For memory-constrained hosts (1–2 GB VPS, edge devices, Lambda), enable in-RAM int8 quantization after load:
db = fdb.DB.open("memory.feather", dim=768)
db.set_int8_ram("text", max_abs=1.0) # 1.76× less RAM, recall@10 ~0.88 vs 0.972
At 60k × 768-dim float32, RAM drops from 227 MB to 129 MB. Recall@10 moves from 0.972 to ~0.88. For context retrieval — surfacing 5–10 relevant memories per query — 0.88 recall is completely acceptable.
| Mode | RAM (60k × 768-dim) | Recall@10 |
|---|---|---|
| float32 (default) | 227 MB | 0.972 |
| int8 in-RAM | 129 MB | ~0.88 |
Stick with float32 when running precision benchmarks, when RAM is not a constraint, or when your index is under 20k vectors and there's no reason to trade recall.
Cold start: persisted HNSW (v0.16.0)
v0.16.0 ships persisted HNSW graph state. The HNSW graph is stored in a ready-to-load binary layout inside the .feather file — no reconstruction on startup. Cold start at 500k vectors drops from 2.7s to 48ms.
This is the highest-impact change for serverless deployments and Kubernetes pods with frequent restarts. Before v0.16.0, parallel load via FEATHER_LOAD_THREADS was the primary lever:
import os
import feather_db as fdb
# v0.15.x: parallel graph reconstruction (4.7× faster than serial)
os.environ["FEATHER_LOAD_THREADS"] = "8"
db = fdb.DB.open("memory.feather", dim=768)
With v0.16.0, FEATHER_LOAD_THREADS is still respected during initial file creation and explicit rebuilds, but routine opens skip reconstruction entirely. Set it in your environment regardless — it handles fallback cases.
| Version | Cold start (500k vectors) | Notes |
|---|---|---|
| v0.15.x serial | ~2.7s | Single-threaded reconstruction |
| v0.15.x + FEATHER_LOAD_THREADS=8 | ~0.6s | Parallel reconstruction, 4.7× |
| v0.16.0 | 48ms | Persisted graph, no reconstruction |
# Dockerfile — set regardless of version
ENV FEATHER_LOAD_THREADS=8
Bulk ingestion with add_batch()
For ingesting more than ~1k vectors at once — corpus imports, historical data seeding, document chunking pipelines — use add_batch(). It builds the HNSW graph in parallel with the GIL released: 3.4× faster than a sequential loop on a 4-core machine, ~5–6× on an 8-core machine.
import feather_db as fdb
import numpy as np
db = fdb.DB.open("corpus.feather", dim=768)
# Prepare vectors and metadata in bulk
texts = load_your_documents() # list of strings
vecs = embed_batch(texts) # np.ndarray shape (N, 768), float32
scores = load_importance_scores(texts) # np.ndarray shape (N,)
metas = []
for i, (text, score) in enumerate(zip(texts, scores)):
m = fdb.Metadata(importance=float(min(1.0, score)))
m.set_attribute("source", "batch_import_2026")
m.set_attribute("doc_id", str(i))
metas.append(m)
ids = list(range(len(texts)))
# Single parallel call — no Python loop overhead
db.add_batch(ids, vecs, metas=metas)
db.save()
print(f"Ingested {db.count()} vectors")
Use add() for real-time single-item inserts (one memory after each conversation turn). Use add_batch() for everything else. The crossover point is roughly 1,000 items — below that the overhead isn't worth it; above it the speedup compounds.
Important: always use meta.set_attribute(key, value), not meta.attributes[key] = value. The dict accessor silently does nothing due to pybind11 copy semantics.
Docker: self-hosted feather-serve
# Clone and build
git clone https://github.com/feather-store/feather.git
cd feather
docker compose -f feather-api/docker-compose.yml build
# feather-api/docker-compose.yml
version: '3.9'
services:
feather-api:
image: feather-api:latest
build:
context: .
dockerfile: Dockerfile
ports:
- "${FEATHER_PORT:-7700}:7700"
volumes:
- feather-data:/data # persistent across restarts and image updates
environment:
- FEATHER_API_KEY=${FEATHER_API_KEY}
- FEATHER_EMBED_PROVIDER=${FEATHER_EMBED_PROVIDER:-gemini}
- GOOGLE_API_KEY=${GOOGLE_API_KEY:-}
- OPENAI_API_KEY=${OPENAI_API_KEY:-}
- FEATHER_LOAD_THREADS=8
- FEATHER_DIM=${FEATHER_DIM:-768}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:7700/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 15s
restart: unless-stopped
volumes:
feather-data:
driver: local
# feather-api/.env
FEATHER_API_KEY=your-secret-key
FEATHER_EMBED_PROVIDER=gemini
GOOGLE_API_KEY=your-google-key
FEATHER_LOAD_THREADS=8
FEATHER_DIM=768
# Start
docker compose -f feather-api/docker-compose.yml up -d
# Verify
curl http://localhost:7700/health
# {"status": "ok", "version": "0.16.0", "vectors": 0, "dim": 768}
For production HTTPS, put Nginx or Caddy in front with proxy_pass http://feather-api:7700 and a Let's Encrypt certificate. The MCP endpoint and admin SPA both work behind a reverse proxy with no additional configuration.
The named volume feather-data persists the .feather file across container restarts, image updates, and host reboots. Never bind-mount to a temp directory — you will lose all memories on container restart.
Monitoring
Feather exposes the metrics you need through the API and in-process.
import time
import feather_db as fdb
start = time.perf_counter()
db = fdb.DB.open("memory.feather", dim=768)
load_time_ms = (time.perf_counter() - start) * 1000
# Core health metrics
record_count = db.count()
namespace_list = db.list_namespaces()
# Memory estimate (float32 baseline)
ram_estimate_mb = record_count * 768 * 4 / 1e6
print(f"Load time: {load_time_ms:.0f}ms")
print(f"Vectors: {record_count:,}")
print(f"Namespaces: {len(namespace_list)}")
print(f"RAM (est.): {ram_estimate_mb:.0f} MB")
# Per-namespace counts for multi-tenant monitoring
for ns in namespace_list:
ns_count = db.count(namespace=ns)
print(f" {ns}: {ns_count} vectors")
Expose these as a /metrics endpoint (Prometheus) or ship them to your observability stack on a 60-second interval. The four numbers to track in production: record count (growth rate), load time (regression signal if it spikes), RAM usage (capacity planning), and per-namespace counts (detect runaway tenants).
Via the REST API from feather-serve:
curl -H "Authorization: Bearer $FEATHER_API_KEY" \
http://localhost:7700/api/v1/stats
# {
# "total_vectors": 84321,
# "namespaces": 412,
# "file_size_mb": 247.3,
# "load_time_ms": 48
# }
Multi-tenant patterns
Two patterns exist for multi-tenant deployments. Choose based on your tenant count and isolation requirements.
One file per tenant
Each tenant gets a dedicated .feather file. Strong physical isolation, easy per-tenant backup and deletion, and predictable per-file memory usage. Best for: small tenant counts (<100), enterprise customers who need data residency guarantees, or tenants with very large individual corpora (>500k vectors each).
def get_db(tenant_id: str) -> fdb.DB:
path = f"/data/feather/tenant-{tenant_id}.feather"
db = fdb.DB.open(path, dim=768)
return db
# Backup one tenant
def backup_tenant(tenant_id: str):
import shutil
shutil.copy(
f"/data/feather/tenant-{tenant_id}.feather",
f"/data/backups/tenant-{tenant_id}.{today()}.feather"
)
# Delete a tenant completely — just delete the file
def offboard_tenant(tenant_id: str):
import os
os.remove(f"/data/feather/tenant-{tenant_id}.feather")
Namespace-per-tenant in a shared file
All tenants share one .feather file, isolated by namespace. Best for: SaaS products with hundreds to tens of thousands of users, where per-file overhead would be impractical. Search latency scales with per-namespace vector count, not total file size.
# One file, thousands of tenants — namespace enforces isolation
db = fdb.DB.open("/data/feather/production.feather", dim=768)
def add_for_user(user_id: str, text: str, category: str):
db.add(embed(text), text=text,
namespace=user_id, entity=category)
db.save()
def search_for_user(user_id: str, query: str, k: int = 5):
return db.search(embed(query), k=k, namespace=user_id)
# Delete all memories for a user (GDPR, offboarding)
def delete_user_data(user_id: str):
db.purge(namespace=user_id)
db.compact()
db.save()
For very large deployments (100k+ namespaces), shard by namespace hash across multiple files with a routing layer. Each shard benefits from persisted HNSW load (48ms cold start) and int8 RAM quantization independently.
Disaster recovery
The .feather file is self-contained. Everything — vectors, HNSW graph, metadata, typed edges, namespace index — is in one binary. Recovery is a file copy.
import shutil
from pathlib import Path
def restore_from_backup(backup_path: str, target_path: str):
"""Restore a .feather file from backup. No special tooling needed."""
shutil.copy(backup_path, target_path)
# Verify the restore loaded clean
db = fdb.DB.open(target_path, dim=768)
print(f"Restored. Vectors: {db.count()}, Namespaces: {len(db.list_namespaces())}")
return db
Disaster recovery checklist:
- Snapshot the file before every migration:
cp production.feather production.$(date +%Y%m%d).bak - Ship nightly backups to object storage — one copy is not a backup
- Compact before shipping a snapshot to a new environment — smaller file, faster restore load
- Test your restore path quarterly: copy a backup, open it, verify count and namespace list
- Never modify a
.featherfile directly — always go through the Feather API. The format has a checksum; corrupt files will fail to open with a clear error rather than silently returning wrong results
Production setup: putting it together
A complete production Python service with namespace isolation, startup optimization, compaction schedule, and monitoring:
import os
import time
import logging
import schedule
import feather_db as fdb
logger = logging.getLogger("feather")
# ── Startup ──────────────────────────────────────────────────────────────
os.environ["FEATHER_LOAD_THREADS"] = "8" # parallel HNSW load
start = time.perf_counter()
DB = fdb.DB.open("/data/feather/production.feather", dim=768)
load_ms = (time.perf_counter() - start) * 1000
# Optional: int8 quantization for memory-constrained hosts
# DB.set_int8_ram("text", max_abs=1.0) # 1.76× less RAM, recall@10 ~0.88
logger.info(f"Feather ready. vectors={DB.count()} load_ms={load_ms:.0f}")
EMBED = load_your_embedder() # e.g. Gemini, OpenAI, Voyage
# ── Core operations ───────────────────────────────────────────────────────
def add_memory(user_id: str, text: str, category: str,
mem_type: str = "fact", importance: float = 1.0):
vec = EMBED(text)
mem = DB.add(vec, text=text, namespace=user_id, entity=category)
mem.meta.importance = importance
mem.meta.set_attribute("type", mem_type)
mem.meta.set_attribute("created_at", time.strftime("%Y-%m-%dT%H:%M:%SZ"))
DB.save()
return mem.id
def search_memory(user_id: str, query: str,
category: str = None, k: int = 5):
vec = EMBED(query)
return DB.search(vec, k=k, namespace=user_id, entity=category)
def delete_user(user_id: str):
"""Full GDPR delete — purge namespace, compact, save."""
DB.purge(namespace=user_id)
DB.compact()
DB.save()
logger.info(f"Deleted namespace={user_id}")
# ── Bulk ingestion ────────────────────────────────────────────────────────
def bulk_seed(user_id: str, records: list[dict]):
"""Seed a user's historical data. Use add_batch for >1k records."""
import numpy as np
texts = [r["text"] for r in records]
vecs = np.array([EMBED(t) for t in texts], dtype=np.float32)
ids = list(range(DB.count(), DB.count() + len(records)))
metas = []
for r in records:
m = fdb.Metadata(importance=r.get("importance", 1.0))
m.set_attribute("source", r.get("source", "seed"))
metas.append(m)
DB.add_batch(ids, vecs, metas=metas, namespace=user_id)
DB.save()
logger.info(f"Seeded {len(records)} records for user={user_id}")
# ── Maintenance schedule ──────────────────────────────────────────────────
def weekly_maintenance():
before = DB.count()
DB.purge(older_than_days=90) # evict memories not recalled in 90 days
DB.compact()
DB.save()
after = DB.count()
logger.info(f"Maintenance: {before - after} nodes pruned, {after} remaining")
schedule.every().monday.at("03:00").do(weekly_maintenance)
# ── Metrics ───────────────────────────────────────────────────────────────
def emit_metrics():
namespaces = DB.list_namespaces()
ram_mb = DB.count() * 768 * 4 / 1e6
logger.info(
f"metrics vectors={DB.count()} "
f"namespaces={len(namespaces)} "
f"ram_estimate_mb={ram_mb:.0f}"
)
schedule.every(60).seconds.do(emit_metrics)
This pattern runs well on a 2-core / 2 GB VPS serving up to ~10k tenants in a shared file. For larger deployments, shard by hash(user_id) % N across N files, each served by its own feather-serve instance behind a load balancer.
Summary: decision table
| Decision | Recommendation |
|---|---|
| Deployment mode | Embedded for single service; feather-serve for multi-service or MCP |
| Namespace design | namespace = tenant ID, entity = topic, attributes = secondary filters |
| Multi-tenant file strategy | Shared file up to ~100k tenants; one file per tenant above that or for data residency |
| Compaction | After bulk deletes >10%, weekly schedule, before shipping backups |
| Memory on constrained hosts | Enable int8 RAM: 1.76× less RAM, recall@10 ~0.88 — fine for context retrieval |
| Cold start | v0.16.0 persisted HNSW = 48ms at 500k vectors; set FEATHER_LOAD_THREADS=8 as fallback |
| Bulk ingestion | add_batch() for >1k items (3.4×); add() for real-time single inserts |
| Disaster recovery | Copy = backup; nightly snapshot to object storage; compact before shipping |
| Docker | Named volume for /data; restart: unless-stopped; FEATHER_LOAD_THREADS in ENV |
Install: pip install feather-db · GitHub: github.com/feather-store/feather