"WHY"
The 2026 long-horizon multi-agent RL stack ships environments as ad-hoc Python classes, with reward as a single scalar, with no provenance over which step caused which credit assignment, and with no pre-registered falsifier for the claim the env is supposed to test. Gymnasium provides a Python ABC. Verl is a training loop. TRL is a fine-tune harness. None of them is a protocol; all of them push the conformance burden onto the env author and the audit burden onto nobody.
throughline is the protocol. It names what an environment owes to a runner across language boundaries, and it pressure-tests that contract against four observable archetypes before declaring v0.0.1. The Zig contract is the wire surface; the BEAM runtime is the orchestration surface; the reference Echo Env is the conformance proof.
The three differentiators against Gymnasium, Verl, and TRL are load-bearing. First, reward is a three-channel decomposition — process channel for rule conformance, outcome channel for terminal reward, judge channel for LLM-grader feedback — not a single scalar. Second, claims are first-class via a claims() callback that returns pre-registered falsifier IDs from the stax-experiment register, so the env declares its own refutation condition at the contract level. Third, every step produces a provenance record so the credit-assignment trace is auditable end-to-end without a separate logging contract.
"WHAT"
A 316-line PROTOCOL.md design spec at the repository root. Closed against four Tier 1 archetypes before scaffold: sports ticketing revenue management, hotel PMS plus housekeeping, hospital bed board plus OR scheduling, and rail dispatching plus crew plus track allocation. The four archetypes were chosen because each one stresses a different axis of the protocol — perishable inventory under demand uncertainty, multi-resource scheduling under hard constraints, safety-critical sequencing under real-time arrivals, and capacity routing under cascading delays — and the protocol holds across all four without amendment.
The Zig contract sits at zig/src/{env,reward,provenance,echo_env,root}.zig. The env interface names reset, step, observe, claims, plus the reward and provenance record shapes. The BEAM runtime mirrors the contract through behaviours at elixir/lib/envs/{env,judge,falsifier,claim,reward_breakdown,echo_env.ex}, with the Falsifier behaviour wired into the stax-experiment register so a refuted claim updates the run register at the lane it was pre-registered against.
The Echo Env is the first conformance pressure test. It is the smallest possible env that exercises every callback in the contract — observation echo on step, a trivial reward decomposition across all three channels, a synthetic provenance record per step, and a no-op claims() callback that registers the conformance claim itself. The Echo Env conforms without protocol amendment, which is the first evidence the contract closed cleanly against the four archetype designs.
"MILESTONES"
- 2026-06-02 · v0.0.1 · tested. PROTOCOL.md design spec closed against four Tier 1 archetypes. Zig contract (env, reward, provenance, echo_env, root) green at 17/17 on Zig 0.16.0. BEAM behaviours (env, judge, falsifier, claim, reward_breakdown, echo_env) green at 17/17 on Elixir 1.18.4 / OTP 28. Reference Echo Env conforms to the contract without protocol amendment. Pre-registered conformance claim
exp-1780410228-166515934is the protocol's own falsifier: refuted if any one of the four Tier 1 archetypes forces an amendment under real conformance.
"DEPENDENCIES"
- Zig 0.16 standard library for the contract layer. No external Zig dependencies.
- Elixir 1.18.4 / OTP 28 for the BEAM runtime. The
stax-experimentregister CLI for the falsifier wiring.
"ADAPTER TARGETS"
throughline-archtics. The first flagship env conforming to the throughline protocol. Long-horizon perishable-inventory plus dynamic-pricing plus multi-segment revenue management. Confirms the protocol survives a non-trivial archetype past Echo.
"RELATED CANON"
- The Mercantile Thesis. The appliance-layer claim throughline operationalises at the env protocol surface.
- Doctrine 14 — Publishing Negative Results. The pre-registered falsifier discipline that produced the
claims()callback as first-class contract.
"RELATED WORKSHOP"
The v0.0.1 to v0.0.2 path is the next archetype past Echo and Archtics — hotel PMS or hospital bed board, whichever lands first. Workshop entry forthcoming on the next conforming env.
"LIMITS"
Pre-1.0 substrate, named honestly.
- Four-archetype closure is design-time, not runtime. v0.0.1 ships the contract green against Echo and Archtics under real test load. The other three Tier 1 archetypes (hotel, hospital, rail) are designed against, not yet implemented against. The pre-registered conformance claim is refuted if any of those three forces an amendment.
- No training loop in scope. throughline is the env protocol, not the algorithm. PPO, GRPO, and the rest live in adapter targets, not in the contract.
- Judge channel reference impl is not yet shipped. The reward decomposition reserves the judge channel as first-class; the reference LLM-grader binding is gated on the next ship. v0.0.1 envs return
nilon the judge channel and the runner contracts that as valid. - BEAM runtime is single-node. Multi-node orchestration of env workers across a tailnet is future revision; the contract is shaped to allow it, the runtime does not yet exercise it.
- Zig 0.16 ceiling. Standard-library API churn each release. The repo pins
0.16.0.
"SOURCE"
- AGPL-3.0-or-later. This substrate page is the canonical public surface; the source mirror is gated by current posture and not advertised as publicly reachable.
"CITATION"
@software{collins_throughline_2026,
author = {Collins, Sean},
title = {{throughline: Canonical Env Protocol for Long-Horizon Multi-Agent RL}},
version = {v0.0.1},
year = {2026},
month = {6},
url = {https://sunlitmoon.online/substrate/throughline.html},
note = {AGPL-3.0-or-later. Substrate page: sunlitmoon.online/substrate/throughline.}
}