safetensors-zig

"WHY"

The weight blob is where the inference path begins. If the operator does not own the weight reader, the operator does not own the first hop of the forward pass. The HuggingFace safetensors package ships a Rust core with a Python binding, which means the so-called sovereign-Zig inference path still loads its weights through a Rust binary inside a Python interpreter. The cost is small per token; the dependency is load-bearing on supply chain because every forward pass starts here.

safetensors-zig closes the wedge at the entry point. It reads the canonical HuggingFace safetensors format into typed tensor views, in pure Zig, single-module, no system dependencies. The substrate it sits under is vllm-zig; the substrate it cooperates with is tokenizers-zig. The deployment envelope is a single statically-linked Zig binary.

"WHAT"

A safetensors reader matching the upstream format spec. The public API is small and stable across the 0.x line: SafeTensors, Parsed, openFromBytes, open, Tensor, DType. The reader parses the JSON header, validates the per-tensor offset table against the payload length, and exposes zero-copy typed views over the underlying byte buffer.

The v0.3 path is what makes it interesting. The header parser is hand-written, single-pass, byte-by-byte, with no AST and no per-token allocation beyond the arena-backed output slices. It dispatches on first-letter for {dtype, shape, data_offsets} keys to skip the std.mem.eql in the inner loop, applies @branchHint(.likely) on the comma-separator arm, scans for the closing " and the escape \ with std.mem.indexOfAny so SIMD lanes carry the byte-class search, and short-circuits the offset sort when the on-disk order is already increasing (which HuggingFace emits by convention).

Result on the Llama-shape fixture: median ~10 microseconds per parse, ~100,000 parses per second. The HuggingFace safetensors 0.4.5 Rust crate clocks ~50 microseconds median on the same fixture and the same hardware. That is ~5x faster than the Rust upstream on the format the upstream defined. The output is byte-identical to the Python safetensors package on the same input.

Three real-model integration tests sit next to the 17 unit tests: a generated Llama-3.2-shape fixture (39 tensors, ~33 MB), a TinyLlama-1.1B real weights load through the bench (201 tensors BF16, 2.2 GB), and a head-to-head comparison driver against the upstream Rust crate.

"MILESTONES"

2026-05-22 · v0.3.0 · benched. Inline plus SIMD scan optimisations layered on the v0.2 hand-tuned parser. Median ~10 us per parse on the Llama-shape fixture. ~5x faster than HuggingFace Rust upstream (up from 3.2x at v0.2). 17 unit tests plus 3 real-model integration tests pass; output byte-identical to the Python safetensors package.
2026-05-21 · v0.2.0 · benched. Hand-tuned safetensors-specific JSON parser replaces std.json. ~3.2x faster than the Rust upstream (Zig ~24 us vs Rust ~77 us median). bench/bench_breakdown.zig confirmed std.json was 97 percent of v0.1 parse time before this optimisation.
2026-05-20 · v0.1.0 · tested. Real-model fixture coverage plus audited Rust upstream bench. Honest negative result recorded: v0.1 was ~1.6x slower than Rust on the Llama-shape fixture. The negative result is the v0.2 and v0.3 motivation.
2026-05-20 · v0.0.1 · shipped. Repository scaffold. Public API surface (SafeTensors, Parsed, openFromBytes, open, Tensor, DType). 11 unit tests pass against canonical fixtures.

"DEPENDENCIES"

Zig 0.16 standard library. No external dependencies. The point is the single-binary deployment envelope.

"ADAPTER TARGETS"

vllm-zig. The weight-load stage of the forward pass. The 2.2 GB TinyLlama blob parses in 241 microseconds into typed views with no intermediate copies; vllm-zig consumes those views directly at block 0.

"RELATED CANON"

Anti-Edison 17 — The AI Wrapper Question. The merchant-lens audit. Weight-load is the first wedge in the inference path.
Doctrine 14 — Publishing Negative Results. The v0.1 ship that recorded an honest loss against the Rust upstream and named the v0.2 path that closed the gap.
The Mercantile Thesis. The appliance-layer claim this substrate is one component of.

"RELATED LAB NOTES"

AI inference in Zig — a 4-repo stack from weights to tokens. safetensors-zig is the first layer; the composition write-up.

"RELATED WORKSHOP"

The v0.3 to v0.4 path (real structural-JSON SIMD scan in the simdjson lineage, targeting the 10x bar) is queued. Workshop entry forthcoming on the next ship.

"LIMITS"

Pre-1.0 substrate, named honestly.

5x not 10x versus Rust upstream. The "out of order better than Meta-OSS" bar is 10x; v0.3 sits at 5x. The v0.4 path is real structural-JSON SIMD per the simdjson literature; the inline plus SIMD scan in v0.3 is a stepping stone toward it, not the destination.
Format-specialised parser; not a general JSON parser. The hand-tuned path accepts canonical safetensors plus \" and \\ escapes; anything more exotic returns InvalidJSON. Weirdly-Unicode-escaped tensor names would need a fallback to std.json. Real-world safetensors files do not use those.
No write path. v0.x is read-only. Writing safetensors is a separate substrate and is not in scope for the inference path.
Single-thread parse. The header parse is fast enough that parallel parse is not the bottleneck; the bottleneck is page-cache warm-up on the weight payload itself, which is an mmap concern, not a parser concern.
Zig 0.16 ceiling. Standard-library API churn each release. The repo pins 0.16.0.

"SOURCE"

AGPL-3.0-or-later. This substrate page is the canonical public surface; the source mirror is gated by current posture and not advertised as publicly reachable.

"INSTALL"

git clone https://github.com/SMC17/safetensors-zig.git
cd safetensors-zig
zig build -Doptimize=ReleaseFast
zig build test

Zig 0.16.0 required. No external dependencies, no Python runtime in the load path.

"DOWNLOAD"

Release tarball: v0.3.0 — ~5x faster than HF Rust upstream on the Llama-shape fixture (~10μs vs ~50μs median). TinyLlama 2.2 GB blob parses in 241 microseconds against 201 BF16 tensors. 17 unit tests + 3 real-model integration tests.
Source archive: v0.3.0.tar.gz.

"CITATION"

@software{collins_safetensors_zig_2026,
  author       = {Collins, Sean},
  title        = {{safetensors-zig: Pure-Zig HuggingFace Safetensors Reader}},
  version      = {v0.3.0},
  year         = {2026},
  month        = {5},
  url          = {https://sunlitmoon.online/substrate/safetensors-zig.html},
  note         = {AGPL-3.0-or-later. Substrate page: sunlitmoon.online/substrate/safetensors-zig.}
}