The gate

A public, falsifiable test — with a hard deadline.

We did not just ship a debate format. We pre-registered a public gate: a date, a bar, and a commitment to retire the claim on the record if the evidence does not arrive. This page is that commitment, in full.

Deadline
days remaining · 2026-09-07
Submit an external run See the run ledger
The terms

What has to be true by the deadline.

The bar
≥5 external-culture runs — different teams, not prompted, refereed, or graded by us — completed through the blind-judging pipeline.
Judging
Blind. The judging rubric and blinding procedure are pre-committed in docs/judging.md and pack/GATES.md. The maintainer logs receipt, redacts for blinding, and assigns external judges.
What counts
External execution plus blind judging — not the score. A run that goes through the pipeline counts whether or not it flatters the format.
Failure clause
If the runs do not materialize, or judged results show no advantage, the claim that this shape is worth productizing dies on the public record.
What passing does not prove
That the format is ready for production use
That internal pilots prove anything
That we have solved governance or agent alignment
productization claim · revocable

This is the opposite of "trust us, it works in our pilots."

Anti-hype is the policy. The pre-registered gate, the blind rubric, and the public failure clause are presented as features. Confidence comes from naming the limits, not from adjectives.

Failed and abandoned runs are wanted evidence. Report them. The gate is about external execution and judging, not about scores.