"WHY"
Mechanistic interpretability (MechInterp) is the discipline of reverse-engineering neural networks into human-readable algorithms. The legacy ecosystem uses static Python notebooks and Jupyter widgets to dump activation tensors long after the forward pass has concluded. This is archaeological, not operational.
The mech_interp_liveview stack is the operational answer. By combining the BEAM VM's concurrent soft-realtime guarantees with Phoenix LiveView, we can stream and visualize attention head activations and multi-layer perceptron (MLP) fires live as the network evaluates.
<div class="benchmark-chart" style="padding: 0; overflow: hidden; background: #050505; position: relative; margin: 2rem 0; border-radius: 8px; border: 1px solid rgba(255,255,255,0.1);"> <canvas id="mech-canvas" style="display: block; width: 100%; border-radius: 6px;"></canvas> <div style="position: absolute; inset: 0; box-shadow: inset 0 0 40px rgba(0,0,0,1); pointer-events: none;"></div> </div> <script src="{{site_url}}/js/mech-visualizer.js"></script>
"WHAT"
This is a telemetry bridge and visualization suite composed of two parts:
- The Zig Probe: A lightweight patch to
vllm-zigthat taps directly into the intermediate tensor states of the transformer blocks and blasts them out via UDP. - The LiveView Dashboard: An Elixir GenServer listener that aggressively strips the binary UDP frames, maps them to the network topology, and broadcasts them via Phoenix Channels directly into SVG/Canvas WebGL frontends.
We are no longer looking at static plots of attention patterns. We are watching the network "think" at 60 frames per second.
"MILESTONES"
- 2026-06-02 · v0.1.5 · tested. Full end-to-end integration with the
vllm-zigforward pass. Successfully traced L2 and L3 attention heads during a 100-token decode burst. - 2026-05-10 · v0.1.0 · benched. BEAM GenServer tuned to handle 10,000 activation events per second without dropping frames on the LiveView websocket.
"LIMITS"
- Bandwidth Bound. Pumping full floating-point matrices over localhost UDP works for small models (TinyLlama), but saturates instantly on 70B+ parameter runs. The next iteration will require selective sparsity masks before broadcast.
"SOURCE"
- AGPL-3.0-or-later. This substrate page is the canonical public surface; the source mirror is gated by current posture and not advertised as publicly reachable.