Operator Gate

v7 Training Parity Checklist

Use this checklist as the operator gate for runbook execution readiness. It answers one operational question: can the runbook proceed on this run directory right now?

Scope

This checklist is for runbook readiness, not final numerical signoff.

Dataset Gate

dataset_qc.json must be present and pass.

Tokenizer Gate

tokenizer_roundtrip.json must report exact_match == true.

Parity Regimen Gate

D1, E1, and F1 must pass in the latest regimen summary.

Canary Gate

The row1/row2 parity canary must pass.

Caveat: A1/A2 may still fail today. Treat those as an active kernel-harness bug track, currently suspected around the SwiGLU harness path, separate from runbook execution readiness.

0) Set Run Path

export RUN="$HOME/.cache/ck-engine-v7/models/train/v7_svg_assets_bpe_l24_full_e1_seq128"
cd /home/antshiv/Workspace/C-Kernel-Engine

1) Dataset + Tokenizer Gates

jq '{status, checks, non_empty_lines, path}' "$RUN/dataset_qc.json"
jq '{status, exact_match, line_eval, tokenizer_json_path}' "$RUN/tokenizer_roundtrip.json"

Pass criteria:

dataset_qc.status == "pass"
tokenizer_roundtrip.status == "pass"
tokenizer_roundtrip.exact_match == true

2) Canary Parity Gate (row1/row2)

Run Step 3.1 from the main runbook, then verify pass lines:

python3 - <<'PY'
import json
import os
from pathlib import Path
from statistics import mean

TH_MAX = 1e-4
TH_MEAN = 5e-5
TH_PARAM = 1e-4
run_env = os.environ.get("RUN", "").strip()
if not run_env:
    print("[FAIL] RUN env var is empty")
    raise SystemExit(1)
root = Path(run_env)
ok = True
for idx in (1, 2):
    run_dir = root / f"parity_svg_row{idx}" / ".ck_pipeline"
    work_dirs = sorted([p for p in run_dir.glob("ascii_bpe_*") if p.is_dir()])
    if not work_dirs:
        print(f"[FAIL] row{idx}: missing {run_dir}/ascii_bpe_*")
        ok = False
        continue
    w = work_dirs[-1]
    ck = json.loads((w / "train_ck.json").read_text())
    pt = json.loads((w / "train_torch_ref.json").read_text())
    c = [float(x["loss_ck"]) for x in ck.get("loss_curve", [])]
    t = [float(x["loss"]) for x in pt.get("loss_curve", [])]
    n = min(len(c), len(t))
    if n == 0:
        print(f"[FAIL] row{idx}: empty loss curves")
        ok = False
        continue
    diffs = [abs(c[i] - t[i]) for i in range(n)]
    max_abs = max(diffs)
    mean_abs = mean(diffs)
    final_param = float(ck.get("final_param_max_abs_diff", 1.0))
    passed = max_abs <= TH_MAX and mean_abs <= TH_MEAN and final_param <= TH_PARAM
    print(f"[row{idx}] max_abs={max_abs:.6e} mean_abs={mean_abs:.6e} final_param={final_param:.6e} pass={passed}")
    ok = ok and passed
print("CANARY_PARITY_GATE=PASS" if ok else "CANARY_PARITY_GATE=FAIL")
PY

3) Run Full Parity Regimen

python3 version/v7/scripts/run_training_parity_regimen_v7.py \
  --run-dir "$RUN" \
  --force

Inspect summary:

jq '.summary' "$RUN/training_parity_regimen_latest.json"

Inspect stage table quickly:

jq '.stages[] | {id,name,status,metrics,artifact_json,artifact_log}' \
  "$RUN/training_parity_regimen_latest.json"

Check generated-runtime stages:

jq '.stages[] | select(.id=="D1" or .id=="E1" or .id=="F1") | {id,status,metrics}' \
  "$RUN/training_parity_regimen_latest.json"

4) One-Shot GO Evaluation

python3 - <<'PY'
import json
import os
from pathlib import Path

run_env = os.environ.get("RUN", "").strip()
if not run_env:
    print("[FAIL] RUN env var is empty")
    raise SystemExit(1)
run = Path(run_env)

def load_json(path: Path):
    return json.loads(path.read_text()) if path.exists() else None

ds = load_json(run / "dataset_qc.json")
rt = load_json(run / "tokenizer_roundtrip.json")
reg = load_json(run / "training_parity_regimen_latest.json")

checks = {}
checks["dataset_qc_pass"] = bool(ds and ds.get("status") == "pass")
checks["tokenizer_exact_match"] = bool(rt and rt.get("status") == "pass" and rt.get("exact_match") is True)

d1e1f1_ok = False
if reg and isinstance(reg.get("stages"), list):
    st = {s.get("id"): s.get("status") for s in reg["stages"] if isinstance(s, dict)}
    d1e1f1_ok = st.get("D1") == "PASS" and st.get("E1") == "PASS" and st.get("F1") == "PASS"
checks["D1_E1_F1_pass"] = d1e1f1_ok

def row_pass(idx: int) -> bool:
    p = run / f"parity_svg_row{idx}" / "parity_pipeline.json"
    if not p.exists():
        return False
    j = json.loads(p.read_text())
    return bool(j.get("status") == "pass")

checks["canary_row1_row2_pass"] = row_pass(1) and row_pass(2)

go = all(checks.values())
print(json.dumps({"GO": go, "checks": checks}, indent=2))
print("GO_EVIDENCE=PASS" if go else "GO_EVIDENCE=FAIL")
PY

5) A1/A2 Caveat and Bug Track

Backend xray is produced by the same regimen run:

jq '.summary, .improvement' "$RUN/regimen_backend_xray.json"

Read suspected source:

jq '.summary.suspected_source, .summary.rationale' "$RUN/regimen_backend_xray.json"

Inspect first-step gradient drift:

jq '{step, global_max_abs_diff, global_mean_abs_diff, worst_tensor, top5: (.per_tensor|sort_by(-.max_abs_diff)|.[0:5])}' \
  "$RUN/regimen_debug_step_grads/step_00000001_grad_diff_summary.json"

Interpretation:

A1/A2 failing does not block runbook execution readiness under this checklist.
A1/A2 still blocks strict kernel-harness parity signoff.
Track and fix A1/A2 in parallel while continuing operator runbook validation.

6) Operator Code Touchpoints

Regimen orchestration

version/v7/scripts/run_training_parity_regimen_v7.py

CK/PyTorch parity harness

version/v7/scripts/train_parity_epochs_v7.py

RMSNorm kernels

src/kernels/rmsnorm_kernels.c

SwiGLU kernels

src/kernels/swiglu_kernels.c

7) Go / No-Go

GO for runbook execution readiness if all four criteria pass:

dataset QC pass
tokenizer exact roundtrip pass
D1/E1/F1 pass
canary row1/row2 pass

NO-GO if any of the four fail. Strict kernel-harness parity signoff remains separate and still requires A1/A2 closure.

8) Exploratory Training

Exploratory means:

training is functional and loss can decrease
outputs can improve
CK-vs-PyTorch numerical equivalence is not guaranteed over horizon
behavior and regression conclusions are therefore provisional

Use exploratory mode for idea testing, not for final parity claims.