Architecture Links

A comprehensive guide to all architecture documentation in C-Kernel-Engine.

Runbooks

v8 Numerical Contracts

How weights, circuits, kernel-map capabilities, complete reduction semantics, deterministic lowering, and parity gates prevent ad hoc runtime dispatch.

circuitskernel mapsreduction contracts

Open Numerical Contracts

v8 Qwen3-VL Runbook

Scoped operator runbook for the validated v8 multimodal inference path: Qwen3-VL decoder + matching mmproj.

cks-v8-run ck_run_v8.py Qwen3-VL

Open v8 Runbook

v8 Vision Encoder Architecture

Design page for how the v8 vision encoder is derived from GGUF + template + lowering, then stitched into the decoder bridge.

qwen3_vl_vision.json build_ir_v8.py encoder → bridge → decoder

Open Vision Encoder Page

v7 Inference + Training Runbook

Copy/paste workflow for HF GGUF inference plus true_bpe training and train->infer handoff.

ck_run_v7.py run train_data_pipeline_v7.py ck_chat.py

Open v7 Runbook

v7 Python Authoring Guide

Step-by-step notebook launch order, Python authoring syntax, and the exact handoff boundary from notebooks or ck.nn into the existing v7 scripts.

TrainingProject ck.v7.compile(...) notebooks 01 → 05

Open Python Authoring Guide

v7 Profiling Runbook

Repeatable performance workflow for v7 training kernels with perf, VTune, flamegraphs, and Advisor.

perf vtune advisor

Open v7 Profiling

Kernel Tuning Methodology

CPU-node tuning loop for v8/v7 kernels: practical coherence, fixed-token throughput, CK profile CSV, VTune, Advisor roofline, microkernel labs, and PR logs.

VTune Advisor roofline CK_PROFILE

Open Methodology

Stitched Divergence Harness

Backend parity method for finding the first CK-vs-reference tensor boundary across llama.cpp/mtmd GGUF lanes and future PyTorch adapters.

CKDMP llama.cpp/mtmd first divergence

Open Harness Guide

v7 SVG Dataset Runbook

Operator workflow to generate Stage A pretraining and Stage B midtraining SVG corpora from docs/site/assets/*.svg, then hand off to v7 training.

build_svg_corpus_from_assets_v7.py prepare_ascii_dataset_v7.py curriculum-stage stage_a|stage_b

Open SVG Dataset Runbook

Core

System Overview IR, Codegen, Kernels
v8 Numerical Contracts Circuits, kernel capabilities, deterministic reduction resolution, and parity gates
IR Pipeline v6.6 Templates → IR1 → IR2 → Lowering
v7 Backprop IR Pipeline Init → IR1 → IR2 → Layout → Codegen + canary diagnostics
v7 Cross-Entropy Parity p - one_hot derivation, PyTorch semantics, long-horizon drift fixes
v7 Grad-Accum Windows Micro-batch vs effective batch, N vs K, CPU batch simulation
v7 Train Data Pipeline One-command dataset -> tokenizer -> train orchestration
v7 Runtime Stitch Graph Function-level forward/backward/accum/optimizer stitching view
v8 Qwen3-VL Runbook Current operator path for validated v8 multimodal inference
v8 Vision Encoder Architecture How GGUF intake, template lowering, memory planning, and bridge stitching power the working vision path
v8 MLA / Kimi Decode Cache Kimi/DeepSeek-style MLA template contract, explicit cache store/read lowering, and reference kernels
v7 Inference + Training Runbook Copy/paste commands for HF GGUF inference + true_bpe training
v7 Python Authoring Guide Notebook lane, TrainingProject, and ck.v7.compile(...) in one place
v7 SVG Dataset Runbook Dataset generation for Stage A pretrain and Stage B midtrain
Model + Kernel Matrix Qwen2/Qwen3/Gemma + kernel coverage
Tokenizer BPE, WordPiece, Trie
Kernel Reference Forward/backward ops
Gated DeltaNet Deep Dive Qwen3.5/qwen3next recurrent attention state update and kernel parity
Code Generation IR to C compilation
Iteration Philosophy Why v1→v6 matters
IR v2 Format Case study: symbolic dimensions
Deep Dive Concepts RoPE, Flash Attention, GQA

Quantization

Quant Fundamentals Block formats, grouping
Bit Manipulation Visuals Q5_0, Q4_K, INT8 with spaced repetition
Quant Format Reference Byte-level visualization
GGUF to Bump Weight conversion
GGUF Parsing Byte-level guide

Optimization

GEMM Memory Layout NN/NT layouts, offsets
GEMM Optimization AVX, MKL, blocking
v7 Train Layout + Dispatch IR3 memory + parallel execution plan
Threadpool GEMM Playbook Split M/N/K policy for training
Kernel Tuning Methodology VTune, Advisor, roofline, CK profile, parity, and benchmark log discipline
SIMD Architecture AVX-512, VNNI, AMX
Flash Attention Analysis Why llama.cpp is faster

Infrastructure

Memory Safety Bump allocator, canaries
Deterministic Memory RDMA, interpretability
Profiling Guide Valgrind, perf, flamegraphs
Kernel Tuning Methodology CPU-node roofline workflow for v8/v7 kernel performance work
v8 Qwen3-VL Runbook Validated decoder/mmproj workflow for multimodal inference
v8 Vision Encoder Architecture Bridge and encoder design notes for the current multimodal inference lane
v7 Profiling Runbook VTune + Advisor + perf/flamegraph on train kernels
v7 Inference + Training Runbook Operational workflow from dataset to chat output
v7 Python Authoring Guide Notebook-driven and module-driven authoring entrypoints for the same v7 runtime
Testing Numerical parity verification
Stitched Divergence Harness First failing tensor boundary across CK and reference backends

Temp / Work in Progress

Draft Documentation
These pages are work-in-progress and may be moved or updated.

Quantization Math Deep Dive

Explains Q5_0/Q8_0 block formats, dequantization math, and AVX-512 vectorization strategy.

GEMM Memory Layout

Covers quantized block storage, cache blocking strategies, and KV cache layouts.

By Task

Task	Documentation
Understanding the system	System Overview, Concepts, v7 Backprop IR, v7 CE Parity, v7 Grad-Accum Windows, v7 Runbook, v7 Python Authoring Guide
Implementing new kernels	Kernel Reference, Gated DeltaNet Deep Dive, Codegen
Quantization work	Quant Fundamentals, Bit Visuals, GGUF Parsing
Performance optimization	Kernel Tuning Methodology, GEMM Layout, v7 Train Layout+Dispatch, Threadpool Playbook, SIMD
Debugging & profiling	Kernel Tuning Methodology, Profiling, v7 Profiling Runbook, Testing, Stitched Divergence Harness, v7 CE Parity Deep Dive, v7 Runtime Stitch Graph, v7 Runbook, v7 Python Authoring Guide
Operator train + compute workflow	v7 SVG Dataset Runbook, v7 Inference + Training Runbook, v7 Python Authoring Guide, v7 Profiling Runbook