The gate

A public, falsifiable test — with a hard deadline.

We did not just ship a debate format. We pre-registered a public gate: a date, a bar, and a commitment to retire the claim on the record if the evidence does not arrive. This page is that commitment, in full.

Deadline

— days remaining · 2026-09-07

Submit an external run See the run ledger

The terms

What has to be true by the deadline.

The bar

≥5 external-culture runs — different teams, not prompted, refereed, or graded by us — completed through the blind-judging pipeline.

Judging

Blind. The judging rubric and blinding procedure are pre-committed in docs/judging.md and pack/GATES.md. The maintainer logs receipt, redacts for blinding, and assigns external judges.

What counts

External execution plus blind judging — not the score. A run that goes through the pipeline counts whether or not it flatters the format.

Failure clause

If the runs do not materialize, or judged results show no advantage, the claim that this shape is worth productizing dies on the public record.

What passing does not prove

That the format is ready for production use

That internal pilots prove anything

That we have solved governance or agent alignment

productization claim · revocable

This is the opposite of "trust us, it works in our pilots."

Anti-hype is the policy. The pre-registered gate, the blind rubric, and the public failure clause are presented as features. Confidence comes from naming the limits, not from adjectives.

Failed and abandoned runs are wanted evidence. Report them. The gate is about external execution and judging, not about scores.