"SUBSTRATE / safetensors-zig"

safetensors-zig

pre-1.0 . v0.3.0 . shipped 2026-05-22 . AGPL-3.0

20 tests (17 unit + 3 real-model integration) . lane: inference

"WHY"

The weight blob is where the inference path begins. If the operator does not own the weight reader, the operator does not own the first hop of the forward pass. The HuggingFace safetensors package ships a Rust core with a Python binding, which means the so-called sovereign-Zig inference path still loads its weights through a Rust binary inside a Python interpreter. The cost is small per token; the dependency is load-bearing on supply chain because every forward pass starts here.

safetensors-zig closes the wedge at the entry point. It reads the canonical HuggingFace safetensors format into typed tensor views, in pure Zig, single-module, no system dependencies. The substrate it sits under is vllm-zig; the substrate it cooperates with is tokenizers-zig. The deployment envelope is a single statically-linked Zig binary.

"WHAT"

A safetensors reader matching the upstream format spec. The public API is small and stable across the 0.x line: SafeTensors, Parsed, openFromBytes, open, Tensor, DType. The reader parses the JSON header, validates the per-tensor offset table against the payload length, and exposes zero-copy typed views over the underlying byte buffer.

The v0.3 path is what makes it interesting. The header parser is hand-written, single-pass, byte-by-byte, with no AST and no per-token allocation beyond the arena-backed output slices. It dispatches on first-letter for {dtype, shape, data_offsets} keys to skip the std.mem.eql in the inner loop, applies @branchHint(.likely) on the comma-separator arm, scans for the closing " and the escape \ with std.mem.indexOfAny so SIMD lanes carry the byte-class search, and short-circuits the offset sort when the on-disk order is already increasing (which HuggingFace emits by convention).

Result on the Llama-shape fixture: median ~10 microseconds per parse, ~100,000 parses per second. The HuggingFace safetensors 0.4.5 Rust crate clocks ~50 microseconds median on the same fixture and the same hardware. That is ~5x faster than the Rust upstream on the format the upstream defined. The output is byte-identical to the Python safetensors package on the same input.

Three real-model integration tests sit next to the 17 unit tests: a generated Llama-3.2-shape fixture (39 tensors, ~33 MB), a TinyLlama-1.1B real weights load through the bench (201 tensors BF16, 2.2 GB), and a head-to-head comparison driver against the upstream Rust crate.

"MILESTONES"

"DEPENDENCIES"

"ADAPTER TARGETS"

"RELATED CANON"

"RELATED LAB NOTES"

"RELATED WORKSHOP"

The v0.3 to v0.4 path (real structural-JSON SIMD scan in the simdjson lineage, targeting the 10x bar) is queued. Workshop entry forthcoming on the next ship.

"LIMITS"

Pre-1.0 substrate, named honestly.

"SOURCE"

"INSTALL"

git clone https://github.com/SMC17/safetensors-zig.git
cd safetensors-zig
zig build -Doptimize=ReleaseFast
zig build test

Zig 0.16.0 required. No external dependencies, no Python runtime in the load path.

"DOWNLOAD"

"CITATION"

@software{collins_safetensors_zig_2026,
  author       = {Collins, Sean},
  title        = {{safetensors-zig: Pure-Zig HuggingFace Safetensors Reader}},
  version      = {v0.3.0},
  year         = {2026},
  month        = {5},
  url          = {https://sunlitmoon.online/substrate/safetensors-zig.html},
  note         = {AGPL-3.0-or-later. Substrate page: sunlitmoon.online/substrate/safetensors-zig.}
}