Test Report
Automated test results from nightly CI runs. Updated daily at 2am UTC.
Loading...
Fetching latest test results...
Error
Failed to load test results.
v7 Fast Regression
| Family | Status | Build | Smoke | Coherence | Contract | 1stTok | Note |
|---|---|---|---|---|---|---|---|
| No regression-fast data published in this report. | |||||||
v7 Kernel Map Contracts
| Validator Notes |
|---|
| No kernel-map summary published in this report. |
v7 Training Family Regression
| Family | Status | A | B | C | D | E | F | Failed Stages |
|---|---|---|---|---|---|---|---|---|
| No training family summary published in this report. | ||||||||
30-Day History
Test Details
0 kernel tests passed, 0 failed (click rows to expand)
| Status | Test | Category | Kernels | Duration |
|---|
About Nightly Tests
The nightly test suite runs automatically at 2am UTC every day. It includes:
- E2E Integration Tests: Full pipeline validation (kernel compile โ IR codegen โ inference)
- Kernel compilation (all SIMD variants)
- IR codegen validation (generates compilable C code)
- Parallel decode codegen (OpenMP pragmas for v6.6)
- Inference verification (prompts model, validates coherent response)
- Kernel Unit Tests: Individual kernel parity with PyTorch
- Llama Pairwise RoPE Contract: explicit regression for consecutive-pair RoPE layout selection plus kernel-level pairwise forward coverage and timing
(
version/v7/scripts/parity/test_check_decode_attention_contract.py,unittest/test_rope.py) - Sliding-Window Contract: Explicit sliding attention prefill/decode contract test
(
unittest/test_attention_sliding_contract.py) - BF16 Tests: BFloat16 precision tests (when CPU supports it)
- Quantization Tests: Q4_K, Q6_K quantized kernel tests
- v7 Kernel Map Contracts: sync + validator + contract tests for the active v7 map surface
(
make v7-kernel-map-contracts) - v7 Backprop Kernel Parity: Optimizer + QK-norm backward ISA matrix + RMSNorm/SwiGLU backward + GEMM backward sweep
(
make v7-kernel-parity-train) - v7 Training Family Regression: family-level
A/B/C/D/E/Ftraining regimen across the full decoder family matrix (make regression-training-fullnightly,make regression-training-fastfor local fast coverage) - v7 IR Visualizer E2E: Runbook-path regression with explicit run-dir wiring, decode profile checks, and tiny train-runtime ASan artifact validation
(
make visualizer,make visualizer-full) - v7 Visualizer Health Gate: Contract-driven static analysis (160 checks) + JS unit tests (100 test vectors) for IR visualizer, dataset viewer, and IR hub
(
make v7-visualizer-health) - v7 Fast Regression: Family bring-up gate for Gemma, Qwen2, Qwen3, Qwen3.5, and Nanbeige with build/smoke/contract/first-token checks
(
make regression-fast) - v7 Regression Ledger: Canonical root-cause log for fixed and monitored bugs, consumed by operators in the visualizer
(
version/v7/reports/REGRESSION_LEDGER.md,version/v7/reports/REGRESSION_LEDGER.json) - llama.cpp Parity: CK vs llama.cpp kernel parity flow
(
scripts/run_parity_smoketest.sh,make llamacpp-parity-full) - v6.6 Preflight Gates: Tooling contracts + 3-model matrix validation
(
make v6.6-validate-contracts,make v6.6-validate-matrix) - Integration Tests: Full layer and model parity
make test (kernel-focused), make visualizer (fast path), make visualizer-full (includes tiny train-runtime ASan checks), make nightly (full matrix).
Nightly coverage now includes v7 kernel-map contracts, v7 backprop kernel parity, published training family regression, visualizer runbook regression checks, and published
regression-fast family summaries.
View the GitHub Actions workflow for detailed logs.
See also: Version History for project roadmap and milestone tracking.
Visualizer Health Matrix
Contract-driven test coverage for the IR Visualizer and Dataset Viewer. Each component declares its tab/function/DOM contracts in JSON; tests validate against those contracts at every push. See the full test runbook.
Run locally: make v7-visualizer-health
IR Visualizer โ 11 tabs, 19 core functions
| Tab | Render Function | L1 Static | L2 Unit Tests |
|---|---|---|---|
| Memory | renderMemory | tab + panel + fn | โ |
| Kernel Flow | renderKernelFlow | tab + panel + fn | โ |
| Stats | renderStats | tab + panel + fn | โ |
| Training | renderTraining | tab + panel + fn | โ |
| Quantization | renderQuantizationAudit | tab + panel + fn | โ |
| Dataflow | renderDataflow | tab + panel + fn | โ |
| Interpretability | renderOperatorMathIntuition | tab + panel + fn | โ |
| Profile | renderProfile | tab + panel + fn | โ |
| Data Pipeline | renderDataPipeline | tab + panel + fn | โ |
| Parity | renderParityCockpit | tab + panel + fn | โ |
L2-tested utilities:
formatBytes (4) ยท
normalizeShapeInput (7) ยท
formatShapeDisplay (3) ยท
normalizeMode (3) ยท
escapeHtml (3) ยท
quoteShell (3) ยท
normalizePathString (2) ยท
pathDirname (3) ยท
extractGgufStem (3) ยท
relativePathFromTo (4)
โ 35 tests
Dataset Viewer โ 12 tabs, 30 core functions
| Tab | Render Function | L1 Static | L2 Unit Tests |
|---|---|---|---|
| Overview | renderOverview | tab + panel + fn | โ |
| Preflight | renderPreflight | tab + panel + fn | โ |
| Gallery | renderGallery | tab + panel + fn | โ |
| Text | renderTextSamples | tab + panel + fn | โ |
| Tokenizer | renderTokenizer | tab + panel + fn | โ |
| Vocabulary | renderVocabulary | tab + panel + fn | โ |
| Classification | renderClassification | tab + panel + fn | โ |
| Browse | renderBrowse | tab + panel + fn | โ |
| Candidates | renderCandidates | tab + panel + fn | โ |
| Quality | renderQuality | tab + panel + fn | โ |
| Embeddings | renderEmbeddings | tab + panel + fn | embColor ยท embNormalise ยท cosineSim |
| Attention | renderAttention | tab + panel + fn | attnColor ยท attnEntropy ยท avgMatrices |
L2-tested functions:
attnColor (9) ยท
embColor (3) ยท
cosineSim (4) ยท
attnEntropy (4) ยท
avgMatrices (3) ยท
embNormalise (4)
โ 27 tests
Coverage Summary
| Level | Component | Checks | Run In |
|---|---|---|---|
| L1 | IR Visualizer | ~68 | pre-push |
| L1 | Dataset Viewer | ~78 | pre-push |
| L1 | IR Hub | ~5 | pre-push |
| L2 | IR Pure Functions | ~50 | pre-push |
| L2 | DS Pure Functions | ~50 | pre-push |
| L3 | Generated E2E (all 3 visualizers) | ~24-44 | nightly |
| Total | ~260 | < 3 seconds |
|
version/v7/tests/contracts/ir_visualizer_contract.json and
dataset_viewer_contract.json.
Add a tab or function? Update the contract โ tests auto-expand.
Level 3 โ Generated-File E2E (Nightly)
Validates the full generation โ validation chain: generate all three visualizer HTML files from the latest training run, then run L1 health checks + embedded JSON structure + cross-artifact consistency on the output.
Run locally: make v7-visualizer-generated-e2e
ยท Specific run: make v7-visualizer-generated-e2e RUN=/path/to/run
| Stage | What it validates | Artifact |
|---|---|---|
| Generate | ir_report.html via open_ir_visualizer.py | ir_report.html |
| Generate | dataset_viewer.html via prepare_run_viewer.py | dataset_viewer.html |
| Generate | ir_hub.html via open_ir_hub.py | ir_hub.html |
| L1 Health | Tabs, functions, DOM targets in generated output | All three files |
| JSON Structure | Embedded JSON blobs (run_config, ir1_decode, layout_decode) | ir_report.html |
| Panel Structure | Panel IDs, attnColor presence, file size | dataset_viewer.html |
| Hub Structure | Run cards, ir_report links, navigation | ir_hub.html |
| Cross-artifact | Run name in hub, vocab consistency, config.json | All three files |