feather-serve + Real Embedders: Semantic Persona Recall Without Writing a Single Embedding Call
Feather DB v0.15.1 adds --embed-provider to feather-serve. Pass text in, get semantic search out — no embedding pipeline to maintain. Here's what changed and how to wire it up.
The embedding pipeline problem
Before v0.15.1, using Feather DB required an embedding step before every add() and search() call. You'd call an embedding API, get a float array, then pass it to Feather. This is fine in Python code — two lines. But it creates friction in two scenarios:
- MCP clients (Claude Desktop, Claude Code) don't naturally generate embedding vectors. You'd need an intermediate step that Claude can't do natively.
- REST API clients calling
feather-servehave to maintain their own embedding pipeline, adding latency and operational complexity.
v0.15.1 solves this: feather-serve --embed-provider makes Feather itself responsible for embedding. You send text; Feather handles vectors.
What --embed-provider does
When feather-serve starts with --embed-provider, it initializes an embedding client for the chosen provider. Every ingest_text, search_text, and feather_add (MCP) call that receives raw text gets embedded server-side before hitting the HNSW index.
# Gemini — native 768-dim, free tier available
GOOGLE_API_KEY=… feather-serve persona.feather \
--embed-provider gemini --dim 768 --port 8001
# OpenAI
OPENAI_API_KEY=… feather-serve persona.feather \
--embed-provider openai --dim 1536 --port 8001
# Voyage AI
VOYAGE_API_KEY=… feather-serve persona.feather \
--embed-provider voyage --dim 1024 --port 8001
# Cohere
COHERE_API_KEY=… feather-serve persona.feather \
--embed-provider cohere --dim 1024 --port 8001
# Ollama — fully offline, no API key
feather-serve persona.feather \
--embed-provider ollama --ollama-model nomic-embed-text --dim 1024 --port 8001
Before vs after
Before v0.15.1 — REST add:
import openai, requests, numpy as np
# Step 1: embed
resp = openai.embeddings.create(model="text-embedding-3-small", input="User prefers Python")
vec = resp.data[0].embedding
# Step 2: store
requests.post("http://localhost:8001/v1/default/add", json={
"id": 1,
"vector": vec,
"metadata": {"text": "User prefers Python"}
})
After v0.15.1 — REST add with real embedder:
import requests
# One call — feather-serve embeds internally
requests.post("http://localhost:8001/v1/default/ingest_text", json={
"id": 1,
"text": "User prefers Python"
})
Same pattern for search:
# Before: embed the query yourself, then search by vector
# After: search by text — feather-serve embeds the query
results = requests.post("http://localhost:8001/v1/default/search_text", json={
"text": "programming language preference",
"k": 5
}).json()
Semantic persona recall via MCP
Combined with the MCP backend (v0.14.0), real embedders make the Claude Desktop persona experience seamless. Claude calls feather_search with a natural language query string — feather-serve embeds it, searches HNSW, returns relevant memories. No embedding step visible anywhere in the MCP tool schema.
// Claude's tool call (MCP)
{
"tool": "feather_search",
"arguments": {
"query": "what programming language does the user prefer?",
"k": 5
}
}
// feather-serve internally:
// 1. embed("what programming language does the user prefer?")
// 2. db.search(vec, k=5)
// 3. return results
Choosing a provider
| Provider | Dim | Cost | Best for |
|---|---|---|---|
| gemini | 768 | Free tier / $0.00002/1K chars | Native Feather format, low cost, multimodal |
| openai | 1536 | $0.02/1M tokens (small) | High quality, widely supported |
| voyage | 1024 | $0.06/1M tokens | Code + technical content |
| cohere | 1024 | $0.10/1M tokens | Multilingual |
| ollama | varies | Free (local compute) | Privacy, air-gap, offline |
For the MCP + Claude Desktop use case, Gemini is the recommended starting point: 768-dim is the native Feather format (matches on-disk int8 quantization), the free tier is generous, and the text-embedding-004 model is competitive in quality benchmarks.
The complete persona stack
With v0.15.1, the full persona context engine stack is:
GOOGLE_API_KEY=… feather-serve persona.feather \
--embed-provider gemini \
--dim 768 \
--port 8001
This single command gives you:
- Semantic add (text in → embedded → stored)
- Semantic search (text query → embedded → ANN → ranked results)
- Context chain (semantic search + BFS graph traversal)
- 14 MCP tools consumable by Claude Desktop/Code
- REST API at
/v1/for programmatic access - Admin SPA at
/admin/for manual inspection
Add db.set_int8_ram("text", max_abs=1.0) at startup for 1.7× RAM savings on memory-constrained hosts.
Install: pip install feather-db==0.15.1