Hybrid Search in Feather DB: BM25 + Dense Vectors Combined
BM25 catches exact keywords. Dense vectors catch meaning. Neither alone is enough. Here's how Feather DB combines both — and when to use which mode.
Hybrid Search in Feather DB: BM25 + Dense Vectors Combined
Architecture · Feather DB · June 2026
The Problem With Picking One
Every search system makes a tradeoff. The two dominant approaches — keyword scoring and semantic vector search — each fail in a predictable and complementary way. Hybrid search in Feather DB exists because that failure is avoidable.
Here's what breaks, and why.
What BM25 Does
BM25 (Best Match 25) is a term frequency-inverse document frequency scoring function. For every query term, it asks two things:
- How often does this term appear in the document? (term frequency, with saturation — the 50th occurrence of "vector" matters less than the 5th)
- How rare is this term across the corpus? (inverse document frequency — "the" scores near zero, "HNSW" scores high)
The final BM25 score is a weighted sum across all query terms. Documents with rare, frequent terms rank highest.
BM25 is fast. No embedding model needed. No GPU. No API key. You tokenize, you score, you rank. On 500 queries over a 10K-document corpus, it runs in about 11 seconds on a single CPU thread.
And the recall numbers are genuinely good:
| Metric | BM25 Score |
|---|---|
| recall@1 | 0.874 |
| recall@3 | 0.942 |
| recall@5 | 0.974 |
| recall@10 | 0.986 |
These are Feather DB's standalone BM25 results on a 500-query benchmark, no API key required. recall@10 of 0.986 means BM25 finds the right document in the top 10 results 98.6% of the time — if the query uses the exact words the document uses.
That last clause is the problem.
What Dense Vector Search Does
Dense vector search works differently. You run your query through an embedding model — OpenAI's text-embedding-3-small, Gemini's gemini-embedding-exp-03-07, or any other — and get back a high-dimensional float vector that encodes semantic meaning, not token identity.
Documents are pre-embedded the same way. At query time, you find the documents whose vectors are closest to the query vector in that embedding space — nearest neighbor search.
Feather DB implements this with HNSW (Hierarchical Navigable Small World graphs), accelerated with AVX2/AVX512 SIMD on x86. The structure lets you approximate nearest neighbors in O(log n) time rather than scanning every vector. p50 ANN latency on 500K vectors: 0.19ms.
What dense search does that BM25 cannot: it understands paraphrases. "Car" and "automobile" land near each other in embedding space. "My API keeps throwing 429 errors" and "rate limiting in production" surface the same documents. The model learned semantic proximity from training on language, not from token overlap.
What dense search misses: exact tokens. If a user queries SKU-10042 or GPT-4o-mini or feather_db.DB.open(), the embedding model compresses those into a region of a 768-dimensional space shared with vaguely similar strings. The exact character sequence stops mattering. A document containing SKU-10042 verbatim may not rank above a document that "sounds like" product identifiers in general.
Why Neither Alone Is Enough
The failure modes are symmetric:
- BM25 misses paraphrases. "Authorization failed" vs "access denied" — same error, different tokens, zero overlap score. BM25 returns nothing useful.
- Dense misses exact tokens.
CVE-2024-38816,order #TXN-8821,--ef-construction=400— the embedding model treats these as opaque blobs and often ranks near-meaningless neighbors above the exact match.
Real user queries are a mix of both patterns. A developer searching "HNSW recall drops with ef below 50" needs dense search to understand the concept, but also needs keyword match to surface the exact parameter name. A support agent searching "customer ID C-48821 refund request" needs exact ID match from BM25 and semantic context from dense.
Hybrid search is not a compromise. It's the correct answer.
Feather DB's Hybrid Approach
Feather DB computes both scores at query time and fuses them into a single ranked list.
The fusion method is a weighted linear combination of normalized scores:
hybrid_score = alpha * dense_score + (1 - alpha) * bm25_score
alpha controls the balance. At alpha=1.0 you get pure dense. At alpha=0.0 you get pure BM25. At alpha=0.7 — the default — dense search leads and BM25 re-ranks against exact token matches.
Before combining, scores are normalized to [0, 1] within each result set. BM25 scores are unbounded floats; cosine similarity scores are [-1, 1]. Min-max normalization within the candidate set makes them comparable before the weighted sum.
The candidate set is the union of top-K results from both retrieval passes. A document that BM25 misses but dense finds (or vice versa) is still eligible for the final ranking. Neither retriever can veto a result — only the combined score determines the final order.
The API
Three search modes, one method:
import feather_db
import numpy as np
db = feather_db.DB.open("knowledge.feather", dim=768)
# Your query, embedded by whatever model you're using
query_vec = embed("authorization failed connecting to database")
# Mode 1: pure dense (semantic similarity only)
results_dense = db.search(query_vec, k=10, mode="dense")
# Mode 2: pure keyword (BM25 only — no embedding needed at search time)
results_bm25 = db.search(query_vec, k=10, mode="keyword")
# Mode 3: hybrid (default — weighted combination)
results_hybrid = db.search(query_vec, k=10, mode="hybrid")
# Adjust the balance: alpha=0.7 means 70% dense, 30% BM25
results_tuned = db.search(query_vec, k=10, mode="hybrid", alpha=0.7)
The mode parameter is the only required addition. All other search arguments (k, filter, metadata filtering) work identically across modes.
Side-by-Side Comparison
The same query, three modes. Query: "SKU-10042 out of stock notification".
import feather_db
db = feather_db.DB.open("products.feather", dim=768)
query_vec = embed("SKU-10042 out of stock notification")
print("=== DENSE ONLY ===")
for r in db.search(query_vec, k=3, mode="dense"):
print(f" [{r.score:.3f}] {r.meta.get_attribute('title')}")
# Output (dense only):
# [0.912] Inventory notification system overview
# [0.887] Managing product availability alerts
# [0.871] Out of stock handling best practices
# — SKU-10042 document does not appear in top 3
print("\n=== BM25 ONLY ===")
for r in db.search(query_vec, k=3, mode="keyword"):
print(f" [{r.score:.3f}] {r.meta.get_attribute('title')}")
# Output (BM25 only):
# [0.998] SKU-10042: Product page and inventory record
# [0.743] Notification triggers for SKU-level events
# [0.681] Stock threshold configuration for SKU-10042
print("\n=== HYBRID (alpha=0.7) ===")
for r in db.search(query_vec, k=3, mode="hybrid", alpha=0.7):
print(f" [{r.score:.3f}] {r.meta.get_attribute('title')}")
# Output (hybrid):
# [0.961] SKU-10042: Product page and inventory record
# [0.934] Inventory notification system overview
# [0.891] Notification triggers for SKU-level events
# — exact match surfaces first, semantic context fills positions 2-3
Hybrid gets both: the exact SKU document ranks first (BM25 contribution), and the semantic context documents rank immediately after (dense contribution). Neither mode alone produces this result.
Score Fusion: Weighting BM25 vs Dense
The default alpha=0.7 is a reasonable starting point, not a universal truth. How to tune it:
- alpha closer to 1.0 — query is open-ended, conceptual, paraphrase-heavy. "What causes high latency in vector search?" Dense dominates; BM25 adds light re-ranking for technical terms.
- alpha closer to 0.5 — query mixes concepts with specific identifiers. "HNSW ef parameter tuning for recall@10". Equal weight; both signals matter.
- alpha closer to 0.0 — query is a lookup by exact token. "Transaction TXN-8821 status". BM25 dominates; dense is noise.
In practice, most user-facing search interfaces benefit from alpha=0.65–0.75. Log query patterns for a week, find the queries that return wrong top-1 results, and nudge alpha in the direction that fixes the majority.
Production Tip: Match Mode to Task
Not every search is a user query. Different tasks have different optimal modes:
| Task | Recommended mode | Reason |
|---|---|---|
| User search bar query | hybrid | Mix of intent types; covers both exact and semantic |
| "Find similar documents" | dense | Pure semantic — no exact token expected |
| ID / SKU / code lookup | keyword | Exact token match; dense adds noise |
| Agent memory retrieval | hybrid | Agents mix conceptual reasoning with specific references |
| Deduplication check | dense | Near-duplicate detection is a semantic problem |
| Citation / reference lookup | keyword | Exact title / DOI / reference string match |
The rule of thumb: use hybrid when a human typed the query; use dense when the query is a vector derived from another document; use keyword when the query contains a code or identifier the document should contain verbatim.
When Hybrid Outperforms
Hybrid has the largest margin over single-mode search in three cases:
- Product names and codes. "iPhone 15 Pro camera settings" — dense finds camera documentation; BM25 pins the exact product. Hybrid surfaces the right product's camera documentation first.
- Technical identifiers mixed with natural language. "Why does
ef_construction=200improve recall but hurt index time?" — without BM25,ef_constructionfloats in embedding space near unrelated parameters. BM25 anchors the exact string. - Short, ambiguous queries with a key token. "GPT-4o pricing" — two words. Dense interprets pricing broadly. BM25 locks on "GPT-4o." Hybrid gets both right.
What This Looks Like Internally
Feather DB's search pipeline for mode="hybrid":
- HNSW ANN search returns top-
K*2candidates by cosine similarity. (Over-fetch to increase recall before re-ranking.) - BM25 index scores the same query against the full inverted index. Returns top-
K*2candidates by BM25 score. - Union of both candidate sets is formed. Documents appearing in both sets carry scores from both passes. Documents appearing in only one carry a score of 0.0 for the other pass.
- Scores are min-max normalized within each list. The dense list is normalized independently from the BM25 list.
- Weighted sum:
alpha * dense_norm + (1-alpha) * bm25_norm. - Final list is sorted descending. Top
kreturned.
The BM25 index is built at ingestion time from the content attribute of each document's metadata. No separate indexing call needed — it's maintained in the .feather file alongside the HNSW graph.
Bottom Line
BM25 recall@10 of 0.986 is remarkable for a zero-dependency, 11-second run on 500 queries. Dense search at p50 of 0.19ms ANN latency is fast enough for any real-time use case. Hybrid combines both in a single db.search() call.
The decision is not "which is better." It's "which failure mode can I not afford." In most AI agent and user-facing search contexts, the answer is both — which makes mode="hybrid" the right default.
# Start here. Tune alpha if needed.
results = db.search(query_vec, k=10, mode="hybrid", alpha=0.7)