Visualizer Test Runbook
Four-level verification pyramid for the IR visualizer, dataset viewer, and IR hub. Catches broken tabs, missing functions, numeric regressions, and rendering failures — all without npm or Playwright.
Zero dependencies — Python 3.8+ and Node.js only.
Every level runs from make v7-visualizer-health.
Test Flow
L1+L2 run in pre-push (< 3 s). L3 runs in make visualizer-full
or nightly. L4 is future work.
L1 Static Health Gate pre-push
Python static analysis of HTML source templates. Zero runtime — no browser, no model needed. Catches missing tabs, undefined functions, broken DOM targets.
python3 version/v7/scripts/test_visualizer_health_v7.py --source
Or via Makefile (runs L1 + L2 together):
make v7-visualizer-health
What It Checks
| Category | Visualizer | Checks | Catches |
|---|---|---|---|
| Tab Existence | IR (11) · Dataset (12) | 23 | Tab button removed or renamed without updating HTML |
| Panel Existence | IR (11) · Dataset (12) | 23 | Panel container missing for a tab |
| Render Functions | IR (10) · Dataset (12) | 22 | Tab has no matching render function |
| Required Functions | IR (19) · Dataset (30) | 49 | Missing attnColor, setElText, embNormalise, etc. |
| Undefined Call Detection | IR · Dataset · Hub | 3 | Calling a function that doesn’t exist (the attnColor bug) |
| DOM Target Coverage | IR | 7+ | getElementById() with no matching element |
| JS Syntax | IR · Dataset | 1 | Syntax errors via node --check |
| Hub Structure | Hub | 5 | Missing run-card, link templates |
Total: ~151 checks across 3 visualizers. All must pass for pre-push.
L2 JS Unit Tests pre-push
Extracts pure JavaScript functions from the IR visualizer and dataset viewer
generator source, writes them to a temp .js file with test vectors,
and runs via node. No npm, no bundler, no browser.
python3 version/v7/scripts/test_visualizer_js_units_v7.py
IR Visualizer Functions
| Function | Tests | What It Validates |
|---|---|---|
formatBytes | 4 | B/KB/MB/GB formatting |
normalizeShapeInput | 7 | Array/string/object → normalised shape array |
formatShapeDisplay | 3 | Shape → “2 × 3 × 4” display string |
normalizeMode | 3 | Mode canonicalization (prefill/decode) |
escapeHtml | 3 | XSS-safe HTML entity escaping |
quoteShell | 3 | Shell-safe quoting for command generation |
normalizePathString | 2 | Backslash → forward slash, trailing strip |
pathDirname | 3 | POSIX parent-directory extraction |
extractGgufStem | 3 | Model filename stem from path/URL |
relativePathFromTo | 4 | Relative path computation between absolute paths |
Dataset Viewer Functions
| Function | Tests | What It Validates |
|---|---|---|
attnColor | 9 | Attention heatmap colormaps (orange/blue/green/heatmap) |
embColor | 3 | Embedding heatmap blue→mid→orange interpolation |
cosineSim | 4 | Cosine similarity: identical, orthogonal, opposite, scaled |
attnEntropy | 4 | Attention entropy: uniform, peaked, with zeros |
avgMatrices | 3 | Matrix averaging, single, null safety |
embNormalise | 4 | Global/col/row normalisation modes + null guard |
Total: 78 unit tests across 16 pure functions. Extracts from source → runs via Node.js.
L3 Generated-File E2E nightly
Generates a tiny model, runs inference and training, builds both the IR visualizer and dataset viewer, then runs L1 health checks on the generated HTML output. This catches template regressions where source looks fine but generated output is stale.
Step 3a — Generate & Train Tiny Model
# Initialize tiny model (vocab=256, d=64, layers=2)
python3 version/v7/scripts/ck_run_v7.py init \
--run-name test_viz_e2e \
--generate-ir --generate-runtime
# Quick sanity train (1 epoch, 1024 tokens, ~30 seconds)
python3 version/v7/scripts/ck_run_v7.py sanity \
--run ~/.cache/ck-engine-v7/models/train/test_viz_e2e \
--train-epochs 1 --train-total-tokens 1024
Step 3b — Generate Visualizers
# IR Visualizer (inference mode)
make visualizer
# IR Visualizer (inference + training)
make visualizer-full
Step 3c — Validate Generated Output
# Run health checks on source + all generated files
python3 version/v7/scripts/test_visualizer_health_v7.py --all
--all scans ~/.cache/ck-engine-v7/models/ for generated
ir_report.html and dataset_viewer.html files and validates
each against the same 151-check contract. Stale files that are missing new required
functions (e.g., setElText) will fail.
Or use the existing E2E harness
# Runs test_ir_visualizer_e2e_v7.py with training runtime
make visualizer-full
L4 Browser Runtime Smoke future
Headless browser validation using Playwright or Puppeteer. Not yet implemented — L1+L2+L3 currently cover all contract and numeric correctness without a browser dependency.
When to add L4
- Canvas rendering bugs that L1-L3 cannot catch (pixel-level attention heatmaps)
- Tab-switch JavaScript errors that only manifest with a live DOM
- CSS layout regressions (e.g., panels overlapping, scrollbars missing)
- Interactive features: modal open/close, search/filter, copy buttons
Candidate Implementation
# Future: headless Playwright smoke test
# npx playwright test tests/visualizer-smoke.spec.ts
# ✓ IR visualizer: all 11 tabs render non-empty content
# ✓ Dataset viewer: all 12 tabs render non-empty content
# ✓ Attention heatmap canvas has non-zero pixel data
# ✓ Tab switch does not throw JS errors
Until L4 is needed, the combination of static analysis (L1), numeric unit tests (L2), and generated-file E2E (L3) provides robust coverage.
Integration Map
| Hook / Target | Levels | Command |
|---|---|---|
.githooks/pre-push [0.5/6] |
L1 + L2 | test_visualizer_health_v7.py --source --quiet && test_visualizer_js_units_v7.py --quiet |
make v7-visualizer-health |
L1 + L2 | Static health + JS unit tests with JSON reports |
make visualizer |
L3 (inference) | Generate + validate IR visualizer (inference mode) |
make visualizer-full |
L3 (train) | Generate + validate IR visualizer (inference + training) |
make v7-visualizer-e2e-nightly |
L3 (full) | Nightly: full E2E with training, skip inference parity |
Failure Playbook
| Failure | Level | Meaning | Fix |
|---|---|---|---|
tab_exists:attention FAIL |
L1 | Tab button removed from HTML source | Restore tab in generator or update test contract |
required_fn:attnColor FAIL |
L1 | Function definition deleted or renamed | Restore function in source; if renamed, update callers + contract |
ir:extract:formatBytes FAIL |
L2 | Function body cannot be parsed from source | Check for syntax changes in function; may need extraction update |
ds:cosineSim:orthogonal FAIL |
L2 | Numeric regression in cosineSim implementation | Review recent changes to cosineSim function |
no_undefined_fn_calls FAIL |
L1 | Code calls a function that isn’t defined | Add the missing function or fix the caller |
L3 generated ir_report.html missing setElText |
L3 | Generated file is stale (built before latest source change) | Regenerate: make visualizer |
Adding New Tests
Add a new L1 contract
Edit test_visualizer_health_v7.py. Add the function name to the
required_fns list for the appropriate visualizer (IR or dataset viewer).
The test will verify the function exists in source.
Add a new L2 unit test
Edit test_visualizer_js_units_v7.py. Add the function name to
fns_needed and add test cases to the test_cases list.
Each test case has a name and a body (JS code that returns
true on pass).
# Example: adding a test for a new function myHelper()
{"name": "ds:myHelper:basic", "body": "return assertDeepEq(myHelper('input'), 'expected');"},
{"name": "ds:myHelper:null", "body": "return assertDeepEq(myHelper(null), '');"},
Add a new tab to a visualizer
When adding a new tab to the IR visualizer or dataset viewer:
- Add the tab name to the
TABSlist intest_visualizer_health_v7.py - Add the render function to
RENDER_FNSandrequired_fns - Run
make v7-visualizer-health— new checks are automatically included