"SUBSTRATE / throughline"

throughline

pre-1.0 . v0.0.1 . shipped 2026-06-02 . AGPL-3.0

34 tests (17 Zig + 17 Elixir) . lane: research

"WHY"

The 2026 long-horizon multi-agent RL stack ships environments as ad-hoc Python classes, with reward as a single scalar, with no provenance over which step caused which credit assignment, and with no pre-registered falsifier for the claim the env is supposed to test. Gymnasium provides a Python ABC. Verl is a training loop. TRL is a fine-tune harness. None of them is a protocol; all of them push the conformance burden onto the env author and the audit burden onto nobody.

throughline is the protocol. It names what an environment owes to a runner across language boundaries, and it pressure-tests that contract against four observable archetypes before declaring v0.0.1. The Zig contract is the wire surface; the BEAM runtime is the orchestration surface; the reference Echo Env is the conformance proof.

The three differentiators against Gymnasium, Verl, and TRL are load-bearing. First, reward is a three-channel decomposition — process channel for rule conformance, outcome channel for terminal reward, judge channel for LLM-grader feedback — not a single scalar. Second, claims are first-class via a claims() callback that returns pre-registered falsifier IDs from the stax-experiment register, so the env declares its own refutation condition at the contract level. Third, every step produces a provenance record so the credit-assignment trace is auditable end-to-end without a separate logging contract.

"WHAT"

A 316-line PROTOCOL.md design spec at the repository root. Closed against four Tier 1 archetypes before scaffold: sports ticketing revenue management, hotel PMS plus housekeeping, hospital bed board plus OR scheduling, and rail dispatching plus crew plus track allocation. The four archetypes were chosen because each one stresses a different axis of the protocol — perishable inventory under demand uncertainty, multi-resource scheduling under hard constraints, safety-critical sequencing under real-time arrivals, and capacity routing under cascading delays — and the protocol holds across all four without amendment.

The Zig contract sits at zig/src/{env,reward,provenance,echo_env,root}.zig. The env interface names reset, step, observe, claims, plus the reward and provenance record shapes. The BEAM runtime mirrors the contract through behaviours at elixir/lib/envs/{env,judge,falsifier,claim,reward_breakdown,echo_env.ex}, with the Falsifier behaviour wired into the stax-experiment register so a refuted claim updates the run register at the lane it was pre-registered against.

The Echo Env is the first conformance pressure test. It is the smallest possible env that exercises every callback in the contract — observation echo on step, a trivial reward decomposition across all three channels, a synthetic provenance record per step, and a no-op claims() callback that registers the conformance claim itself. The Echo Env conforms without protocol amendment, which is the first evidence the contract closed cleanly against the four archetype designs.

"MILESTONES"

"DEPENDENCIES"

"ADAPTER TARGETS"

"RELATED CANON"

"RELATED WORKSHOP"

The v0.0.1 to v0.0.2 path is the next archetype past Echo and Archtics — hotel PMS or hospital bed board, whichever lands first. Workshop entry forthcoming on the next conforming env.

"LIMITS"

Pre-1.0 substrate, named honestly.

"SOURCE"

"CITATION"

@software{collins_throughline_2026,
  author       = {Collins, Sean},
  title        = {{throughline: Canonical Env Protocol for Long-Horizon Multi-Agent RL}},
  version      = {v0.0.1},
  year         = {2026},
  month        = {6},
  url          = {https://sunlitmoon.online/substrate/throughline.html},
  note         = {AGPL-3.0-or-later. Substrate page: sunlitmoon.online/substrate/throughline.}
}