Back to Theory
Architecture8 min read · June 16, 2026

Inside Feather DB's HNSW: M, ef_construction, and Why Defaults Matter

Feather DB's HNSW achieves p50=0.19ms at 500K vectors with recall@10=97.2%. Behind that number are three parameters — M=16, ef_construction=200, ef=50 — tuned specifically for AI memory workloads.

F
Feather DB
Engineering

What HNSW does

Hierarchical Navigable Small World (HNSW) is the approximate nearest neighbor algorithm that powers Feather DB's vector search. Given a query vector and a corpus of stored vectors, HNSW finds the k nearest neighbors (by cosine or L2 distance) in sub-linear time — typically O(log n) for search — compared to the O(n) brute-force scan.

The core idea: build a layered graph where each node connects to its M nearest neighbors at each level. The top level has few nodes and long-range connections. The bottom level has all nodes with short-range connections. Search starts at the top-level entry point and greedily traverses downward, using each level's graph to navigate toward the query region before descending.

Three parameters control the quality-performance-memory trade-off: M (connectivity), ef_construction (build quality), and ef (search thoroughness).

Parameter 1: M — edges per node

M is the maximum number of bidirectional edges each node maintains in the HNSW graph at each layer. Feather DB's default is M=16.

Higher M means:

  • More graph connectivity → higher recall (fewer missed true neighbors)
  • More edges stored → higher memory usage (roughly linear in M)
  • Faster search convergence in some high-dimensional spaces

Lower M means:

  • Sparser graph → lower recall, especially at high-k retrieval
  • Lower memory usage
  • Faster build time

M=16 is the HNSW paper's recommended sweet spot for 768-dimensional vectors (the typical output dimension of embedding models like text-embedding-3-small, all-MiniLM, or nomic-embed). At lower dimensions (64-128), M=8 can achieve comparable recall. At higher dimensions (1536+), M=24 or M=32 may be needed to maintain recall above 95%.

For Feather DB's primary workload — 768-dim AI memory with 10K-500K vectors — M=16 achieves recall@10=97.2% with p50 search latency of 0.19ms at 500K vectors. This is the default and no change is needed for typical use.

Parameter 2: ef_construction — build-time candidate list

ef_construction controls how thoroughly the HNSW graph is constructed. During index build, when inserting a new node, the algorithm searches for the best M neighbors using a candidate list of size ef_construction. Larger ef_construction means more candidates considered → better neighbor selection → higher recall on the finished index. It has no effect on search latency — only on index quality and build time.

Feather DB's default is ef_construction=200. At this setting:

  • Index build is approximately 2-3× slower than ef_construction=50
  • Recall@10 is 97.2% at 500K vectors (vs. approximately 91% at ef_construction=50)
  • The build quality is "set and forget" — no recall tuning needed post-build

For the write path, Feather DB's add_batch() API (v0.15+) achieves 3.4× throughput improvement over sequential add() calls by batching HNSW insertions and amortizing the overhead of neighbor list updates. At ef_construction=200, add_batch() can ingest approximately 12,000 vectors per second on a modern CPU core (768-dim).

Parameter 3: ef — search-time candidate list

ef (sometimes called ef_search) is the only HNSW parameter that affects search latency directly. It controls the size of the candidate list maintained during graph traversal. Larger ef → more candidates explored → higher recall but higher latency.

Feather DB's default is ef=50. The recall-latency trade-off at 500K vectors (768-dim, M=16, ef_construction=200):

efrecall@10p50 latencyp99 latency
1091.4%0.09ms0.31ms
2094.8%0.12ms0.42ms
50 (default)97.2%0.19ms0.67ms
10098.6%0.35ms1.21ms
20099.3%0.68ms2.38ms

The default ef=50 sits at the elbow of the recall-latency curve for the AI memory workload. Going from ef=50 to ef=100 costs 84% more latency for 1.4 percentage points of recall gain. Going from ef=20 to ef=50 costs 58% more latency for 2.4 percentage points of gain — a better ratio, but ef=50 is already at the point of diminishing returns for most use cases.

You can override ef at query time:

import feather_db as fdb

db = fdb.DB.open("memory.feather", dim=768)

# Default ef=50 — good for most workloads
results = db.search(query_vec, k=10)

# Higher ef for maximum recall when latency budget allows
results_high_recall = db.search(query_vec, k=10, ef=200)

# Lower ef for latency-critical paths where 91% recall is acceptable
results_fast = db.search(query_vec, k=10, ef=10)

Why defaults are tuned for AI memory workloads

The HNSW defaults in academic papers are typically tuned for benchmark datasets (SIFT, GIST, ANN-Benchmarks) that use different distributions than AI memory workloads. Feather DB's defaults are tuned for a specific profile:

  • Dimensionality: 768. The dominant embedding dimension for text models in 2025-2026. Lower than image models (typically 1024-2048), higher than keyword models.
  • Corpus size: 10K-500K vectors. The range for a single agent's accumulated memory over 1-24 months. Not the billion-scale of web search, not the toy-scale of 1K documents.
  • Query distribution: non-uniform. AI memory queries are clustered around current context — recent sessions, active topics, the user's current task. The HNSW graph should navigate efficiently toward these clusters.
  • Write rate: moderate but continuous. A memory store receives new entries every few minutes during active use, not bulk-ingested once. The incremental HNSW insertion path is exercised continuously.

At this profile, M=16 / ef_construction=200 / ef=50 hits the optimal point on each trade-off curve without requiring the user to tune anything. The benchmarked result — p50=0.19ms, recall@10=97.2%, at 500K vectors — is achievable on a standard single-core without GPU acceleration.

Memory usage

HNSW graph memory is dominated by the neighbor lists. For M=16, each node stores up to 16 neighbors at each level. The expected number of levels per node is approximately 1 / ln(M) ≈ 0.36, so most nodes are on level 0 only.

Approximate memory for the HNSW graph structure alone (excluding vectors):

  • 10K vectors: ~2.5 MB
  • 100K vectors: ~25 MB
  • 500K vectors: ~125 MB

With float32 vectors at 768 dimensions:

  • 10K vectors: ~29 MB (vectors) + 2.5 MB (graph) = ~31.5 MB
  • 100K vectors: ~295 MB (vectors) + 25 MB (graph) = ~320 MB
  • 500K vectors: ~1.5 GB (vectors) + 125 MB (graph) = ~1.6 GB

With in-RAM int8 quantization (v0.15+), the vector memory is reduced by 1.7×:

  • 500K vectors: ~880 MB (int8 vectors + scale factors) + 125 MB (graph) = ~1 GB

The int8 quantization reduces RAM usage with a negligible recall penalty (recall@10 drops from 97.2% to 96.9% in benchmarks) because the HNSW graph structure itself is unchanged — only the distance computation uses quantized values.

Choosing parameters for your workload

import feather_db as fdb

# Default: AI memory workload (768-dim, 10K-500K vectors)
db = fdb.DB.open("memory.feather", dim=768)
# M=16, ef_construction=200, ef=50 by default

# High-dimensional (1536-dim) — consider M=24 for better recall
db_hd = fdb.DB.open("memory_hd.feather", dim=1536, M=24, ef_construction=200)

# Latency-critical with smaller corpus (under 50K vectors)
# ef=20 gives 94.8% recall at 0.12ms p50
results = db.search(query_vec, k=10, ef=20)

# Maximum recall for reranking candidate generation
# ef=200 gives 99.3% recall, use for offline batch jobs
results = db.search(query_vec, k=100, ef=200)

For the vast majority of AI agent memory use cases, the defaults require no tuning. The numbers — 0.19ms, 97.2% recall, 500K vectors — are what you get out of the box.

Install: pip install feather-db · GitHub: github.com/feather-store/feather