Back to Theory
Tutorial8 min read · June 16, 2026

feather-serve + Real Embedders: Semantic Persona Recall Without Writing a Single Embedding Call

Feather DB v0.15.1 adds --embed-provider to feather-serve. Pass text in, get semantic search out — no embedding pipeline to maintain. Here's what changed and how to wire it up.

F
Feather DB
Engineering

The embedding pipeline problem

Before v0.15.1, using Feather DB required an embedding step before every add() and search() call. You'd call an embedding API, get a float array, then pass it to Feather. This is fine in Python code — two lines. But it creates friction in two scenarios:

  1. MCP clients (Claude Desktop, Claude Code) don't naturally generate embedding vectors. You'd need an intermediate step that Claude can't do natively.
  2. REST API clients calling feather-serve have to maintain their own embedding pipeline, adding latency and operational complexity.

v0.15.1 solves this: feather-serve --embed-provider makes Feather itself responsible for embedding. You send text; Feather handles vectors.

What --embed-provider does

When feather-serve starts with --embed-provider, it initializes an embedding client for the chosen provider. Every ingest_text, search_text, and feather_add (MCP) call that receives raw text gets embedded server-side before hitting the HNSW index.

# Gemini — native 768-dim, free tier available
GOOGLE_API_KEY=… feather-serve persona.feather \
  --embed-provider gemini --dim 768 --port 8001

# OpenAI
OPENAI_API_KEY=… feather-serve persona.feather \
  --embed-provider openai --dim 1536 --port 8001

# Voyage AI
VOYAGE_API_KEY=… feather-serve persona.feather \
  --embed-provider voyage --dim 1024 --port 8001

# Cohere
COHERE_API_KEY=… feather-serve persona.feather \
  --embed-provider cohere --dim 1024 --port 8001

# Ollama — fully offline, no API key
feather-serve persona.feather \
  --embed-provider ollama --ollama-model nomic-embed-text --dim 1024 --port 8001

Before vs after

Before v0.15.1 — REST add:

import openai, requests, numpy as np

# Step 1: embed
resp = openai.embeddings.create(model="text-embedding-3-small", input="User prefers Python")
vec = resp.data[0].embedding

# Step 2: store
requests.post("http://localhost:8001/v1/default/add", json={
    "id": 1,
    "vector": vec,
    "metadata": {"text": "User prefers Python"}
})

After v0.15.1 — REST add with real embedder:

import requests

# One call — feather-serve embeds internally
requests.post("http://localhost:8001/v1/default/ingest_text", json={
    "id": 1,
    "text": "User prefers Python"
})

Same pattern for search:

# Before: embed the query yourself, then search by vector
# After: search by text — feather-serve embeds the query
results = requests.post("http://localhost:8001/v1/default/search_text", json={
    "text": "programming language preference",
    "k": 5
}).json()

Semantic persona recall via MCP

Combined with the MCP backend (v0.14.0), real embedders make the Claude Desktop persona experience seamless. Claude calls feather_search with a natural language query string — feather-serve embeds it, searches HNSW, returns relevant memories. No embedding step visible anywhere in the MCP tool schema.

// Claude's tool call (MCP)
{
  "tool": "feather_search",
  "arguments": {
    "query": "what programming language does the user prefer?",
    "k": 5
  }
}

// feather-serve internally:
// 1. embed("what programming language does the user prefer?")
// 2. db.search(vec, k=5)
// 3. return results

Choosing a provider

ProviderDimCostBest for
gemini768Free tier / $0.00002/1K charsNative Feather format, low cost, multimodal
openai1536$0.02/1M tokens (small)High quality, widely supported
voyage1024$0.06/1M tokensCode + technical content
cohere1024$0.10/1M tokensMultilingual
ollamavariesFree (local compute)Privacy, air-gap, offline

For the MCP + Claude Desktop use case, Gemini is the recommended starting point: 768-dim is the native Feather format (matches on-disk int8 quantization), the free tier is generous, and the text-embedding-004 model is competitive in quality benchmarks.

The complete persona stack

With v0.15.1, the full persona context engine stack is:

GOOGLE_API_KEY=… feather-serve persona.feather \
  --embed-provider gemini \
  --dim 768 \
  --port 8001

This single command gives you:

  • Semantic add (text in → embedded → stored)
  • Semantic search (text query → embedded → ANN → ranked results)
  • Context chain (semantic search + BFS graph traversal)
  • 14 MCP tools consumable by Claude Desktop/Code
  • REST API at /v1/ for programmatic access
  • Admin SPA at /admin/ for manual inspection

Add db.set_int8_ram("text", max_abs=1.0) at startup for 1.7× RAM savings on memory-constrained hosts.

Install: pip install feather-db==0.15.1