Visualizer Test Runbook

Four-level verification pyramid for the IR visualizer, dataset viewer, and IR hub. Catches broken tabs, missing functions, numeric regressions, and rendering failures — all without npm or Playwright.

Zero dependencies — Python 3.8+ and Node.js only. Every level runs from make v7-visualizer-health.

L1 Static Health L2 JS Units L3 Generated E2E L4 Browser <1s · <2s · ~60s · future
1
Static Health
Tabs, panels, render functions, required functions, DOM targets, JS syntax (node --check)
< 1 s
2
JS Unit Tests
Pure functions extracted & run through Node.js with test vectors. formatBytes, attnColor, cosineSim, embNormalise…
< 2 s
3
Generated E2E
Generate tiny model → build IR visualizer & dataset viewer → validate generated HTML
~ 60 s
4
Browser Runtime
Open generated HTML in headless browser, verify tabs render, click interactions, canvas output
future

Test Flow

L1 Static Health L2 JS Units L3 Generate Model L3 Build Visualizers L3 Validate Output L4 Browser

L1+L2 run in pre-push (< 3 s). L3 runs in make visualizer-full or nightly. L4 is future work.

L1 Static Health Gate pre-push

Python static analysis of HTML source templates. Zero runtime — no browser, no model needed. Catches missing tabs, undefined functions, broken DOM targets.

python3 version/v7/scripts/test_visualizer_health_v7.py --source

Or via Makefile (runs L1 + L2 together):

make v7-visualizer-health

What It Checks

CategoryVisualizerChecksCatches
Tab Existence IR (11) · Dataset (12) 23 Tab button removed or renamed without updating HTML
Panel Existence IR (11) · Dataset (12) 23 Panel container missing for a tab
Render Functions IR (10) · Dataset (12) 22 Tab has no matching render function
Required Functions IR (19) · Dataset (30) 49 Missing attnColor, setElText, embNormalise, etc.
Undefined Call Detection IR · Dataset · Hub 3 Calling a function that doesn’t exist (the attnColor bug)
DOM Target Coverage IR 7+ getElementById() with no matching element
JS Syntax IR · Dataset 1 Syntax errors via node --check
Hub Structure Hub 5 Missing run-card, link templates

Total: ~151 checks across 3 visualizers. All must pass for pre-push.

Produces → visualizer_health_latest.json

L2 JS Unit Tests pre-push

Extracts pure JavaScript functions from the IR visualizer and dataset viewer generator source, writes them to a temp .js file with test vectors, and runs via node. No npm, no bundler, no browser.

python3 version/v7/scripts/test_visualizer_js_units_v7.py

IR Visualizer Functions

FunctionTestsWhat It Validates
formatBytes4B/KB/MB/GB formatting
normalizeShapeInput7Array/string/object → normalised shape array
formatShapeDisplay3Shape → “2 × 3 × 4” display string
normalizeMode3Mode canonicalization (prefill/decode)
escapeHtml3XSS-safe HTML entity escaping
quoteShell3Shell-safe quoting for command generation
normalizePathString2Backslash → forward slash, trailing strip
pathDirname3POSIX parent-directory extraction
extractGgufStem3Model filename stem from path/URL
relativePathFromTo4Relative path computation between absolute paths

Dataset Viewer Functions

FunctionTestsWhat It Validates
attnColor9Attention heatmap colormaps (orange/blue/green/heatmap)
embColor3Embedding heatmap blue→mid→orange interpolation
cosineSim4Cosine similarity: identical, orthogonal, opposite, scaled
attnEntropy4Attention entropy: uniform, peaked, with zeros
avgMatrices3Matrix averaging, single, null safety
embNormalise4Global/col/row normalisation modes + null guard

Total: 78 unit tests across 16 pure functions. Extracts from source → runs via Node.js.

Produces → visualizer_js_units_latest.json

L3 Generated-File E2E nightly

Generates a tiny model, runs inference and training, builds both the IR visualizer and dataset viewer, then runs L1 health checks on the generated HTML output. This catches template regressions where source looks fine but generated output is stale.

Step 3a — Generate & Train Tiny Model

# Initialize tiny model (vocab=256, d=64, layers=2)
python3 version/v7/scripts/ck_run_v7.py init \
  --run-name test_viz_e2e \
  --generate-ir --generate-runtime
# Quick sanity train (1 epoch, 1024 tokens, ~30 seconds)
python3 version/v7/scripts/ck_run_v7.py sanity \
  --run ~/.cache/ck-engine-v7/models/train/test_viz_e2e \
  --train-epochs 1 --train-total-tokens 1024

Step 3b — Generate Visualizers

# IR Visualizer (inference mode)
make visualizer
# IR Visualizer (inference + training)
make visualizer-full

Step 3c — Validate Generated Output

# Run health checks on source + all generated files
python3 version/v7/scripts/test_visualizer_health_v7.py --all

--all scans ~/.cache/ck-engine-v7/models/ for generated ir_report.html and dataset_viewer.html files and validates each against the same 151-check contract. Stale files that are missing new required functions (e.g., setElText) will fail.

Or use the existing E2E harness

# Runs test_ir_visualizer_e2e_v7.py with training runtime
make visualizer-full
Produces → ir_report.html dataset_viewer.html ir_visualizer_e2e_latest.json

L4 Browser Runtime Smoke future

Headless browser validation using Playwright or Puppeteer. Not yet implemented — L1+L2+L3 currently cover all contract and numeric correctness without a browser dependency.

When to add L4

Candidate Implementation

# Future: headless Playwright smoke test
# npx playwright test tests/visualizer-smoke.spec.ts
#   ✓ IR visualizer: all 11 tabs render non-empty content
#   ✓ Dataset viewer: all 12 tabs render non-empty content
#   ✓ Attention heatmap canvas has non-zero pixel data
#   ✓ Tab switch does not throw JS errors

Until L4 is needed, the combination of static analysis (L1), numeric unit tests (L2), and generated-file E2E (L3) provides robust coverage.

Integration Map

Hook / TargetLevelsCommand
.githooks/pre-push [0.5/6] L1 + L2 test_visualizer_health_v7.py --source --quiet && test_visualizer_js_units_v7.py --quiet
make v7-visualizer-health L1 + L2 Static health + JS unit tests with JSON reports
make visualizer L3 (inference) Generate + validate IR visualizer (inference mode)
make visualizer-full L3 (train) Generate + validate IR visualizer (inference + training)
make v7-visualizer-e2e-nightly L3 (full) Nightly: full E2E with training, skip inference parity

Failure Playbook

FailureLevelMeaningFix
tab_exists:attention FAIL L1 Tab button removed from HTML source Restore tab in generator or update test contract
required_fn:attnColor FAIL L1 Function definition deleted or renamed Restore function in source; if renamed, update callers + contract
ir:extract:formatBytes FAIL L2 Function body cannot be parsed from source Check for syntax changes in function; may need extraction update
ds:cosineSim:orthogonal FAIL L2 Numeric regression in cosineSim implementation Review recent changes to cosineSim function
no_undefined_fn_calls FAIL L1 Code calls a function that isn’t defined Add the missing function or fix the caller
L3 generated ir_report.html missing setElText L3 Generated file is stale (built before latest source change) Regenerate: make visualizer

Adding New Tests

Add a new L1 contract

Edit test_visualizer_health_v7.py. Add the function name to the required_fns list for the appropriate visualizer (IR or dataset viewer). The test will verify the function exists in source.

Add a new L2 unit test

Edit test_visualizer_js_units_v7.py. Add the function name to fns_needed and add test cases to the test_cases list. Each test case has a name and a body (JS code that returns true on pass).

# Example: adding a test for a new function myHelper()
{"name": "ds:myHelper:basic", "body": "return assertDeepEq(myHelper('input'), 'expected');"},
{"name": "ds:myHelper:null",  "body": "return assertDeepEq(myHelper(null), '');"},

Add a new tab to a visualizer

When adding a new tab to the IR visualizer or dataset viewer:

  1. Add the tab name to the TABS list in test_visualizer_health_v7.py
  2. Add the render function to RENDER_FNS and required_fns
  3. Run make v7-visualizer-health — new checks are automatically included
Image
100% | |
Scroll to zoom | Drag to pan | W/H to fit | 0 to reset | ESC to close