Inside Feather DB's HNSW: M, ef_construction, and Why Defaults Matter
Feather DB's HNSW achieves p50=0.19ms at 500K vectors with recall@10=97.2%. Behind that number are three parameters — M=16, ef_construction=200, ef=50 — tuned specifically for AI memory workloads.
What HNSW does
Hierarchical Navigable Small World (HNSW) is the approximate nearest neighbor algorithm that powers Feather DB's vector search. Given a query vector and a corpus of stored vectors, HNSW finds the k nearest neighbors (by cosine or L2 distance) in sub-linear time — typically O(log n) for search — compared to the O(n) brute-force scan.
The core idea: build a layered graph where each node connects to its M nearest neighbors at each level. The top level has few nodes and long-range connections. The bottom level has all nodes with short-range connections. Search starts at the top-level entry point and greedily traverses downward, using each level's graph to navigate toward the query region before descending.
Three parameters control the quality-performance-memory trade-off: M (connectivity), ef_construction (build quality), and ef (search thoroughness).
Parameter 1: M — edges per node
M is the maximum number of bidirectional edges each node maintains in the HNSW graph at each layer. Feather DB's default is M=16.
Higher M means:
- More graph connectivity → higher recall (fewer missed true neighbors)
- More edges stored → higher memory usage (roughly linear in M)
- Faster search convergence in some high-dimensional spaces
Lower M means:
- Sparser graph → lower recall, especially at high-k retrieval
- Lower memory usage
- Faster build time
M=16 is the HNSW paper's recommended sweet spot for 768-dimensional vectors (the typical output dimension of embedding models like text-embedding-3-small, all-MiniLM, or nomic-embed). At lower dimensions (64-128), M=8 can achieve comparable recall. At higher dimensions (1536+), M=24 or M=32 may be needed to maintain recall above 95%.
For Feather DB's primary workload — 768-dim AI memory with 10K-500K vectors — M=16 achieves recall@10=97.2% with p50 search latency of 0.19ms at 500K vectors. This is the default and no change is needed for typical use.
Parameter 2: ef_construction — build-time candidate list
ef_construction controls how thoroughly the HNSW graph is constructed. During index build, when inserting a new node, the algorithm searches for the best M neighbors using a candidate list of size ef_construction. Larger ef_construction means more candidates considered → better neighbor selection → higher recall on the finished index. It has no effect on search latency — only on index quality and build time.
Feather DB's default is ef_construction=200. At this setting:
- Index build is approximately 2-3× slower than ef_construction=50
- Recall@10 is 97.2% at 500K vectors (vs. approximately 91% at ef_construction=50)
- The build quality is "set and forget" — no recall tuning needed post-build
For the write path, Feather DB's add_batch() API (v0.15+) achieves 3.4× throughput improvement over sequential add() calls by batching HNSW insertions and amortizing the overhead of neighbor list updates. At ef_construction=200, add_batch() can ingest approximately 12,000 vectors per second on a modern CPU core (768-dim).
Parameter 3: ef — search-time candidate list
ef (sometimes called ef_search) is the only HNSW parameter that affects search latency directly. It controls the size of the candidate list maintained during graph traversal. Larger ef → more candidates explored → higher recall but higher latency.
Feather DB's default is ef=50. The recall-latency trade-off at 500K vectors (768-dim, M=16, ef_construction=200):
| ef | recall@10 | p50 latency | p99 latency |
|---|---|---|---|
| 10 | 91.4% | 0.09ms | 0.31ms |
| 20 | 94.8% | 0.12ms | 0.42ms |
| 50 (default) | 97.2% | 0.19ms | 0.67ms |
| 100 | 98.6% | 0.35ms | 1.21ms |
| 200 | 99.3% | 0.68ms | 2.38ms |
The default ef=50 sits at the elbow of the recall-latency curve for the AI memory workload. Going from ef=50 to ef=100 costs 84% more latency for 1.4 percentage points of recall gain. Going from ef=20 to ef=50 costs 58% more latency for 2.4 percentage points of gain — a better ratio, but ef=50 is already at the point of diminishing returns for most use cases.
You can override ef at query time:
import feather_db as fdb
db = fdb.DB.open("memory.feather", dim=768)
# Default ef=50 — good for most workloads
results = db.search(query_vec, k=10)
# Higher ef for maximum recall when latency budget allows
results_high_recall = db.search(query_vec, k=10, ef=200)
# Lower ef for latency-critical paths where 91% recall is acceptable
results_fast = db.search(query_vec, k=10, ef=10)
Why defaults are tuned for AI memory workloads
The HNSW defaults in academic papers are typically tuned for benchmark datasets (SIFT, GIST, ANN-Benchmarks) that use different distributions than AI memory workloads. Feather DB's defaults are tuned for a specific profile:
- Dimensionality: 768. The dominant embedding dimension for text models in 2025-2026. Lower than image models (typically 1024-2048), higher than keyword models.
- Corpus size: 10K-500K vectors. The range for a single agent's accumulated memory over 1-24 months. Not the billion-scale of web search, not the toy-scale of 1K documents.
- Query distribution: non-uniform. AI memory queries are clustered around current context — recent sessions, active topics, the user's current task. The HNSW graph should navigate efficiently toward these clusters.
- Write rate: moderate but continuous. A memory store receives new entries every few minutes during active use, not bulk-ingested once. The incremental HNSW insertion path is exercised continuously.
At this profile, M=16 / ef_construction=200 / ef=50 hits the optimal point on each trade-off curve without requiring the user to tune anything. The benchmarked result — p50=0.19ms, recall@10=97.2%, at 500K vectors — is achievable on a standard single-core without GPU acceleration.
Memory usage
HNSW graph memory is dominated by the neighbor lists. For M=16, each node stores up to 16 neighbors at each level. The expected number of levels per node is approximately 1 / ln(M) ≈ 0.36, so most nodes are on level 0 only.
Approximate memory for the HNSW graph structure alone (excluding vectors):
- 10K vectors: ~2.5 MB
- 100K vectors: ~25 MB
- 500K vectors: ~125 MB
With float32 vectors at 768 dimensions:
- 10K vectors: ~29 MB (vectors) + 2.5 MB (graph) = ~31.5 MB
- 100K vectors: ~295 MB (vectors) + 25 MB (graph) = ~320 MB
- 500K vectors: ~1.5 GB (vectors) + 125 MB (graph) = ~1.6 GB
With in-RAM int8 quantization (v0.15+), the vector memory is reduced by 1.7×:
- 500K vectors: ~880 MB (int8 vectors + scale factors) + 125 MB (graph) = ~1 GB
The int8 quantization reduces RAM usage with a negligible recall penalty (recall@10 drops from 97.2% to 96.9% in benchmarks) because the HNSW graph structure itself is unchanged — only the distance computation uses quantized values.
Choosing parameters for your workload
import feather_db as fdb
# Default: AI memory workload (768-dim, 10K-500K vectors)
db = fdb.DB.open("memory.feather", dim=768)
# M=16, ef_construction=200, ef=50 by default
# High-dimensional (1536-dim) — consider M=24 for better recall
db_hd = fdb.DB.open("memory_hd.feather", dim=1536, M=24, ef_construction=200)
# Latency-critical with smaller corpus (under 50K vectors)
# ef=20 gives 94.8% recall at 0.12ms p50
results = db.search(query_vec, k=10, ef=20)
# Maximum recall for reranking candidate generation
# ef=200 gives 99.3% recall, use for offline batch jobs
results = db.search(query_vec, k=100, ef=200)
For the vast majority of AI agent memory use cases, the defaults require no tuning. The numbers — 0.19ms, 97.2% recall, 500K vectors — are what you get out of the box.
Install: pip install feather-db · GitHub: github.com/feather-store/feather