Spec19 Textbook Routing Mixture
spec19 is the recommended next branch after the bounded-intent experiments on top of the frozen
spec16 scene-bundle winner. The point is not to keep churning near-identical routing rungs.
The point is to change the data philosophy in a controlled way: explicit named mixture buckets, denser
textbook-style routing coverage, cleaner minimal pairs, and capacity as a separate fallback lever instead of
a hidden curriculum tweak.
spec17 was a useful diagnostic bridge and spec18
proved the routing-first launcher path end to end, but spec18 r1 still ended with zero held-out exactness.
That blocks spec18 r2 on the same recipe. The next public recommendation is a new branch:
spec19.
Why Spec19 Exists
Spec17 Fixed The Question
spec17 established the real problem: bounded intent is a planning task, not just a syntax task.
It also forced better audits and cleaner contrast coverage.
Spec18 Fixed The Launch Shape
spec18 kept the same compiler, renderer, tokenizer, and shared [bundle] contract while
testing a routing-first curriculum. The run path was correct and reproducible.
Held-Out Routing Still Failed
spec18 r1 ended with overall exactness around 4.8%, but visible and hidden held-out exactness
stayed at 0%. That means the remaining problem is not a missing guardrail. It is still a data and capacity question.
What To Copy From The Papers And The NVIDIA Dataset Lesson
Named Mixtures, Not One Blob
The useful lesson from NVIDIA's large public pretraining collections is organizational: split the corpus into named components with explicit provenance and mixture control. For CK-native training, that means versioned bucket ids, row counts, token counts, and stage weights.
phi-1: Increase Structural Density
Prefer clean, compiler-validated, textbook-style teaching rows over more noisy coverage. The next branch should spend more mass on direct routing supervision and minimal pairs than on broad scaffold language.
Grokking: Stop Same-Shape Churn
If loss moves and held-out exactness stays flat, that is a block on repeating the same training style. Another
spec18 rung with the same recipe would likely add cost without adding clarity.
HumanEval / Codex: Measure Correctness
Renderable-but-wrong rows are semantic failures, not wins. Promotion should still depend on held-out exactness.
Sidecar best-of-k analysis can help diagnose whether the model knows more than greedy decode shows.
CodeT5: Score Fields, Not Just Strings
Bundle tags behave like code identifiers with constrained semantics. That means evaluation should separate syntax from family, form, style, and topology errors instead of collapsing everything into one exact-match bucket.
AlphaCode: Generate Then Filter
The compiler is a structural oracle. Best-of-k decode plus compiler filtering is a valid sidecar metric
and practical inference path when syntax is close but greedy decode is brittle.
Nemotron: Curate The Mixture Explicitly
The dataset lesson is operational, not mystical: keep named buckets, provenance, dedupe discipline, and measured mix ratios. The run artifact directory should record those manifests so the branch can be replayed or shipped.
Hard Boundary
Keep Fixed
- Seed from frozen
spec16 r9. - Freeze the tokenizer unless a tokenizer branch is the explicit axis.
- Keep the same shared
[bundle] ... [/bundle]output contract. - Keep the same deterministic renderer and compiler boundary.
Change One Main Thing
- Change the dataset philosophy, not the whole stack at once.
- Move to named mixture buckets.
- Increase direct routebook density and minimal-pair coverage.
- Separate any later capacity step into its own branch decision.
Do Not Regress Into
- warning-language repair rows
- raw SVG targets
- prompt-contamination teaching
- blurry runs that change curriculum and capacity together
Named Mixture Buckets
| Bucket | Purpose | Typical contents |
|---|---|---|
Anchor Replay |
Preserve closure and exact bundle syntax. | Explicit bundle anchors, clean-stop anchors, permuted explicit anchors. |
Routebook Direct |
Teach topic + goal + audience -> family + form. |
Short direct prompts with deterministic lexical templates and canonical targets. |
Form Minimal Pairs |
Separate sibling forms inside one family. | Near-neighbor prompts with one controlled form-changing difference. |
Family Minimal Pairs |
Separate similar intents across different families. | Cross-family contrasts with explicit distractor structure when needed. |
Routebook Paraphrase |
Strengthen prompt-surface robustness after the direct route is learned. | Reordered tags, plain-language paraphrases, and small lexical alternations. |
Style/Topology Bridge |
Widen to style and count inference only after routing rows exist in mass. | Bounded emphasis, constraint, and content_pack hints. |
Repair Hygiene |
Keep narrow closure cleanup separate from semantic teaching. | Stop-boundary and singleton cleanup only, no broad warning prose. |
Holdout / Hidden |
Evaluation only. | Visible routing holdouts, hidden paraphrase, hidden recombination. |
Stage Mix
Stage A: Route Lock
| Surface family | Weight | Reason |
|---|---|---|
| anchors | 20% |
Keep the solved shared bundle contract alive. |
routebook_direct |
25% |
Make direct routing the center of mass. |
form_minimal_pairs |
20% |
Force sibling-form discrimination. |
family_minimal_pairs |
15% |
Force cross-family discrimination. |
routebook_paraphrase |
10% |
Improve robustness without widening semantics too early. |
style_topology_bridge + hygiene |
10% |
Keep widening pressure present, but subordinate to routing. |
Stage B: Controlled Widening
| Surface family | Weight | Reason |
|---|---|---|
| anchors | 20% |
Preserve non-regression on the solved contract. |
| direct + minimal-pair routing | 45% |
Routing stays dominant even while the target space widens. |
routebook_paraphrase |
10% |
Keep surface robustness under the same semantic contract. |
style_topology_bridge |
20% |
Add style and topology only after stronger routing coverage exists. |
| recombination + hygiene | 5% |
Probe whether routing survives slightly broader combinations. |
Recommended Run Sequence
Spec19 R1 Canary
Keep the current architecture first. The question is whether the routebook mixture changes held-out exactness without introducing new stack risk.
- same frozen tokenizer
- same
spec16 r9seed - same shared bundle contract
- full-pass canary only
Gate For Promotion
Require nonzero exactness on both visible held-out splits and at least one hidden split. Renderability alone is not enough.
If It Is Still Zero
Treat that as a clean signal that the current model size may be the limit. Run a separate capacity canary on
the same spec19 dataset recipe instead of silently changing data and model size together.
How To Read The Spec19 Rungs
By 2026-04-02, spec19 is no longer a reset line. It is a learnable rung line. The main question is no longer
“does this branch work at all?” but “which curriculum regions are still under-covered?”
| Rung | What changed | What it taught |
|---|---|---|
r2 |
Expanded corpus, stronger compiler smoke gating, longer real training budget. | First real bounded-intent success. Held-out exactness crossed above zero on every split, proving the branch is learnable. |
r3b |
Coherent replay union of prior train corpora with clean eval collision filtering. | Replay-heavy cumulative data improved aggregate exactness, but visible test regressed. That showed replay helps, but the coverage was still unbalanced. |
r3c |
r3b replay base plus broader neighbor coverage around persistent miss classes. |
Visible test recovered to 10/10 and the family miss class disappeared entirely. Aggregate exactness did not rise because the remaining pressure moved into style and syntax. |
Current Error-Class Shift
| Rung | Failure mix |
|---|---|
r3b |
family 1, form 3, style 2, syntax 2 |
r3c |
family 0, form 2, style 3, syntax 3 |
That is a deterministic signal. The branch is progressing in meaningful areas and regressing in a few structured areas. The right response is broader balanced curriculum support around those unstable regions, not narrow prompt patches and not a new spec.
Related Pages
Spec Training Method
The broader method for one-axis runs, deterministic repair boundaries, and branch decisions.
Training Curriculum
The long-range CK-native roadmap that places spec19 inside the wider version ladder.
Spec17 Curriculum Blueprint
The bounded-intent predecessor that made the problem visible and established the family structure.