"WHY"
Vector search is the bottleneck that does not show up on the bill. The model is the cost line; the embeddings index is the latency line. The Python ecosystem ships FAISS as a C++ binding with a thin Python facade, which means the operator who owns the model still does not own the retrieval path. The retrieval path is where RAG, agent memory, semantic cache, and recommendation routing all funnel through, and it is the layer where lock-in compounds quietly through serialised index formats and pinned wheel versions.
faiss-zig is the retrieval substrate. Pure Zig, no C++ dependency, no pinned wheel. The four index families it carries (Flat, HNSW, IVFFlat, IVFPQ) cover the operating envelope from small-N exact search to large-N memory-compressed approximate search. The point is not to outrun FAISS; the point is to remove the C++/Python wedge from the retrieval-layer audit and to keep the index format inside the same single-binary deployment envelope as the rest of the Sovereign Stack.
"WHAT"
Four index families. Each is the standard published algorithm, named honestly.
- FlatIndex. Exhaustive L2-squared, inner-product, and cosine search. The reference implementation against which the approximate-search recall numbers are measured. 2.1M vectors scored per second on a single thread of consumer Ryzen-class silicon at D=128.
- IndexHNSW. Hierarchical Navigable Small World per Malkov 2016. Shipped at v0.3 (2026-05-21) with 70 percent or better top-10 recall versus FlatIndex on synthetic uniform data. The graph builder is single-threaded; search is single-threaded; both are deterministic given a seed.
- IndexIVFFlat. Inverted-file index with flat per-list storage. Coarse k-means partitions the space; query probes the nearest
nprobelists. At N=10,000 D=128 the v0.6 ship clocks 4,801 queries per second against 557 q/s for Flat (8.61x). - IndexIVFPQ. Inverted-file with Product Quantization residual encoding per Jegou-Douze-Schmid 2011. The production FAISS workhorse. At N=10,000 D=128 M=8 the v0.7 ship clocks 1,915 queries per second with 16.94x raw memory compression (per-vector 12 bytes versus 512 bytes of raw F32 before metadata).
The end-to-end claim is composability with the rest of the inference stack. faiss-zig accepts the embeddings vllm-zig emits at its final hidden-state head, the index sits in the same Zig binary as the forward pass, and the retrieved prefix re-enters the forward pass without crossing an FFI boundary or a Python interpreter.
"MILESTONES"
- 2026-05-22 · v0.7.0 · benched. IndexIVFPQ. 11 new unit tests including recall-versus-Flat threshold pass. 16.94x memory compression measured on the test fixture.
- 2026-05-21 · v0.6.0 · benched. IndexIVFFlat. 8.61x search speedup versus Flat at N=10,000 D=128.
- 2026-05-21 · v0.3.0 · tested. IndexHNSW per Malkov 2016. 70 percent-plus top-10 recall versus FlatIndex on the test fixture.
- 2026-05-20 · v0.0.1 · tested. FlatIndex. 2.1M vectors scored per second on the scalar single-thread path at D=128.
"DEPENDENCIES"
- Zig 0.16 standard library. No external dependencies. The point is the single-binary deployment envelope.
"ADAPTER TARGETS"
vllm-zig. RAG retrieval path. The retrieved-context prefix re-enters vllm-zig at the forward-pass entry without crossing an FFI boundary.murmur. The threshold-flock messaging substrate uses faiss-zig as its HNSW port for content-addressed routing.
"RELATED CANON"
- Anti-Edison 17 — The AI Wrapper Question. The merchant-lens audit. The retrieval layer is the second wedge after the inference layer.
- The Mercantile Thesis. The appliance-layer claim this substrate composes against.
"RELATED LAB NOTES"
- AI inference in Zig — a 4-repo stack from weights to tokens. faiss-zig is the fourth layer.
"RELATED WORKSHOP"
The v0.7 to v0.8 work (cosine and inner-product on IVFPQ; OPQ rotation matrix; parallel k-means assignment) is queued. Workshop entry forthcoming on the next ship.
"LIMITS"
Pre-1.0 substrate, named honestly.
- L2-squared only on IVFPQ at v0.7. Cosine and inner-product follow in v0.7.1.
- No OPQ rotation matrix. OPQ adds a learned rotation before PQ to maximise codebook quality on axis-correlated data. The numbers cited on uniform random data understate the FAISS-published numbers on real embeddings precisely because there is no OPQ stage yet. v0.8 work.
- Single-thread training. Both coarse and per-subspace k-means are sequential. Parallel assignment in v0.8.
- K-means init is uniform-random sampling. Not k-means++. Standard but coarser initial centroids.
- Recall numbers measured on uniform random data. The synthetic fixture is honest about being synthetic; the real-embedding recall numbers track the FAISS-published range once OPQ lands, but the substrate page does not pre-claim them.
- Zig 0.16 ceiling. Standard-library API churn each release. The repo pins
0.16.0.
"SOURCE"
- Source:
github.com/SMC17/faiss-zig. AGPL-3.0-or-later. Read-only browse, release tarballs, CI history.
"INSTALL"
git clone https://github.com/SMC17/faiss-zig.git
cd faiss-zig
zig build -Doptimize=ReleaseFast
zig build test
Zig 0.16.0 required. No external dependencies, no system-library link, no Python runtime in the path. Single-binary deployment envelope.
"DOWNLOAD"
- Release tarball:
v0.7.0— Flat + HNSW + IVFFlat + IVFPQ. 44 unit tests, recall-versus-Flat threshold tests pass, 16.94x memory compression measured on the test fixture. - Source archive:
v0.7.0.tar.gz.
"CITATION"
@software{collins_faiss_zig_2026,
author = {Collins, Sean},
title = {{faiss-zig: Pure-Zig Vector Similarity Search}},
version = {v0.7.0},
year = {2026},
month = {5},
url = {https://sunlitmoon.online/substrate/faiss-zig.html},
note = {AGPL-3.0-or-later. Source: github.com/SMC17/faiss-zig.}
}