Test Report

Automated test results from nightly CI runs. Updated daily at 2am UTC.

Fetching latest test results...

Error

Failed to load test results.

ALL PASS

Passed

Failed

Total

Duration

v7 Fast Regression

`make regression-fast` status is not available in this report yet.

Overall

Families Passed

Families Failed

Families Total

Family	Status	Build	Smoke	Coherence	Contract	1stTok	Note
No regression-fast data published in this report.

v7 Kernel Map Contracts

`make v7-kernel-map-contracts` status is not available in this report yet.

Overall

Maps Passed

Maps Failed

Warnings

Validator Notes
No kernel-map summary published in this report.

v7 Training Family Regression

`make regression-training-full` status is not available in this report yet.

Overall

Families Passed

Families Failed

Families Total

Family	Status	A	B	C	D	E	F	Failed Stages
No training family summary published in this report.

30-Day History

All passed

Some failed

No data

Test Details

0 kernel tests passed, 0 failed (click rows to expand)

	Status	Test	Category	Kernels	Duration

About Nightly Tests

The nightly test suite runs automatically at 2am UTC every day. It includes:

E2E Integration Tests: Full pipeline validation (kernel compile → IR codegen → inference)
- Kernel compilation (all SIMD variants)
- IR codegen validation (generates compilable C code)
- Parallel decode codegen (OpenMP pragmas for v6.6)
- Inference verification (prompts model, validates coherent response)
Kernel Unit Tests: Individual kernel parity with PyTorch
Llama Pairwise RoPE Contract: explicit regression for consecutive-pair RoPE layout selection plus kernel-level pairwise forward coverage and timing (version/v7/scripts/parity/test_check_decode_attention_contract.py, unittest/test_rope.py)
Sliding-Window Contract: Explicit sliding attention prefill/decode contract test (unittest/test_attention_sliding_contract.py)
BF16 Tests: BFloat16 precision tests (when CPU supports it)
Quantization Tests: Q4_K, Q6_K quantized kernel tests
v7 Kernel Map Contracts: sync + validator + contract tests for the active v7 map surface (make v7-kernel-map-contracts)
v7 Backprop Kernel Parity: Optimizer + QK-norm backward ISA matrix + RMSNorm/SwiGLU backward + GEMM backward sweep (make v7-kernel-parity-train)
v7 Training Family Regression: family-level A/B/C/D/E/F training regimen across the full decoder family matrix (make regression-training-full nightly, make regression-training-fast for local fast coverage)
v7 IR Visualizer E2E: Runbook-path regression with explicit run-dir wiring, decode profile checks, and tiny train-runtime ASan artifact validation (make visualizer, make visualizer-full)
v7 Visualizer Health Gate: Contract-driven static analysis (160 checks) + JS unit tests (100 test vectors) for IR visualizer, dataset viewer, and IR hub (make v7-visualizer-health)
v7 Fast Regression: Family bring-up gate for Gemma, Qwen2, Qwen3, Qwen3.5, and Nanbeige with build/smoke/contract/first-token checks (make regression-fast)
v7 Regression Ledger: Canonical root-cause log for fixed and monitored bugs, consumed by operators in the visualizer (version/v7/reports/REGRESSION_LEDGER.md, version/v7/reports/REGRESSION_LEDGER.json)
llama.cpp Parity: CK vs llama.cpp kernel parity flow (scripts/run_parity_smoketest.sh, make llamacpp-parity-full)
v6.6 Preflight Gates: Tooling contracts + 3-model matrix validation (make v6.6-validate-contracts, make v6.6-validate-matrix)
Integration Tests: Full layer and model parity

💡

Run locally: make test (kernel-focused), make visualizer (fast path), make visualizer-full (includes tiny train-runtime ASan checks), make nightly (full matrix).
Nightly coverage now includes v7 kernel-map contracts, v7 backprop kernel parity, published training family regression, visualizer runbook regression checks, and published regression-fast family summaries.

View the GitHub Actions workflow for detailed logs.

See also: Version History for project roadmap and milestone tracking.

Visualizer Health Matrix

Contract-driven test coverage for the IR Visualizer and Dataset Viewer. Each component declares its tab/function/DOM contracts in JSON; tests validate against those contracts at every push. See the full test runbook.

Run locally: make v7-visualizer-health

IR Visualizer — 11 tabs, 19 core functions

Tab	Render Function	L1 Static	L2 Unit Tests
Memory	`renderMemory`	tab + panel + fn	—
Kernel Flow	`renderKernelFlow`	tab + panel + fn	—
Stats	`renderStats`	tab + panel + fn	—
Training	`renderTraining`	tab + panel + fn	—
Quantization	`renderQuantizationAudit`	tab + panel + fn	—
Dataflow	`renderDataflow`	tab + panel + fn	—
Interpretability	`renderOperatorMathIntuition`	tab + panel + fn	—
Profile	`renderProfile`	tab + panel + fn	—
Data Pipeline	`renderDataPipeline`	tab + panel + fn	—
Parity	`renderParityCockpit`	tab + panel + fn	—

L2-tested utilities: formatBytes (4) · normalizeShapeInput (7) · formatShapeDisplay (3) · normalizeMode (3) · escapeHtml (3) · quoteShell (3) · normalizePathString (2) · pathDirname (3) · extractGgufStem (3) · relativePathFromTo (4) — 35 tests

Dataset Viewer — 12 tabs, 30 core functions

Tab	Render Function	L1 Static	L2 Unit Tests
Overview	`renderOverview`	tab + panel + fn	—
Preflight	`renderPreflight`	tab + panel + fn	—
Gallery	`renderGallery`	tab + panel + fn	—
Text	`renderTextSamples`	tab + panel + fn	—
Tokenizer	`renderTokenizer`	tab + panel + fn	—
Vocabulary	`renderVocabulary`	tab + panel + fn	—
Classification	`renderClassification`	tab + panel + fn	—
Browse	`renderBrowse`	tab + panel + fn	—
Candidates	`renderCandidates`	tab + panel + fn	—
Quality	`renderQuality`	tab + panel + fn	—
Embeddings	`renderEmbeddings`	tab + panel + fn	embColor · embNormalise · cosineSim
Attention	`renderAttention`	tab + panel + fn	attnColor · attnEntropy · avgMatrices

L2-tested functions: attnColor (9) · embColor (3) · cosineSim (4) · attnEntropy (4) · avgMatrices (3) · embNormalise (4) — 27 tests

Coverage Summary

Level	Component	Checks	Run In
L1	IR Visualizer	~68	`pre-push`
L1	Dataset Viewer	~78	`pre-push`
L1	IR Hub	~5	`pre-push`
L2	IR Pure Functions	~50	`pre-push`
L2	DS Pure Functions	~50	`pre-push`
L3	Generated E2E (all 3 visualizers)	~24-44	`nightly`
Total		~260	`< 3 seconds`

📋

Contract-driven: All tab, function, and test vector definitions live in version/v7/tests/contracts/ir_visualizer_contract.json and dataset_viewer_contract.json. Add a tab or function? Update the contract — tests auto-expand.

Level 3 — Generated-File E2E (Nightly)

Validates the full generation → validation chain: generate all three visualizer HTML files from the latest training run, then run L1 health checks + embedded JSON structure + cross-artifact consistency on the output.

Run locally: make v7-visualizer-generated-e2e · Specific run: make v7-visualizer-generated-e2e RUN=/path/to/run

Stage	What it validates	Artifact
Generate	ir_report.html via `open_ir_visualizer.py`	ir_report.html
Generate	dataset_viewer.html via `prepare_run_viewer.py`	dataset_viewer.html
Generate	ir_hub.html via `open_ir_hub.py`	ir_hub.html
L1 Health	Tabs, functions, DOM targets in generated output	All three files
JSON Structure	Embedded JSON blobs (run_config, ir1_decode, layout_decode)	ir_report.html
Panel Structure	Panel IDs, attnColor presence, file size	dataset_viewer.html
Hub Structure	Run cards, ir_report links, navigation	ir_hub.html
Cross-artifact	Run name in hub, vocab consistency, config.json	All three files