Architecture Links
A comprehensive guide to all architecture documentation in C-Kernel-Engine.
Runbooks
v8 Qwen3-VL Runbook
Scoped operator runbook for the validated v8 multimodal inference path: Qwen3-VL decoder + matching mmproj.
v8 Vision Encoder Architecture
Design page for how the v8 vision encoder is derived from GGUF + template + lowering, then stitched into the decoder bridge.
v7 Inference + Training Runbook
Copy/paste workflow for HF GGUF inference plus true_bpe training and train->infer handoff.
v7 Python Authoring Guide
Step-by-step notebook launch order, Python authoring syntax, and the exact handoff boundary from notebooks or ck.nn into the existing v7 scripts.
v7 Profiling Runbook
Repeatable performance workflow for v7 training kernels with perf, VTune, flamegraphs, and Advisor.
v7 SVG Dataset Runbook
Operator workflow to generate Stage A pretraining and Stage B midtraining SVG corpora from docs/site/assets/*.svg, then hand off to v7 training.
Core
- System Overview IR, Codegen, Kernels
- IR Pipeline v6.6 Templates → IR1 → IR2 → Lowering
- v7 Backprop IR Pipeline Init → IR1 → IR2 → Layout → Codegen + canary diagnostics
- v7 Cross-Entropy Parity p - one_hot derivation, PyTorch semantics, long-horizon drift fixes
- v7 Grad-Accum Windows Micro-batch vs effective batch, N vs K, CPU batch simulation
- v7 Train Data Pipeline One-command dataset -> tokenizer -> train orchestration
- v7 Runtime Stitch Graph Function-level forward/backward/accum/optimizer stitching view
- v8 Qwen3-VL Runbook Current operator path for validated v8 multimodal inference
- v8 Vision Encoder Architecture How GGUF intake, template lowering, memory planning, and bridge stitching power the working vision path
- v7 Inference + Training Runbook Copy/paste commands for HF GGUF inference + true_bpe training
- v7 Python Authoring Guide
Notebook lane,
TrainingProject, andck.v7.compile(...)in one place - v7 SVG Dataset Runbook Dataset generation for Stage A pretrain and Stage B midtrain
- Model + Kernel Matrix Qwen2/Qwen3/Gemma + kernel coverage
- Tokenizer BPE, WordPiece, Trie
- Kernel Reference Forward/backward ops
- Gated DeltaNet Deep Dive Qwen3.5/qwen3next recurrent attention state update and kernel parity
- Code Generation IR to C compilation
- Iteration Philosophy Why v1→v6 matters
- IR v2 Format Case study: symbolic dimensions
- Deep Dive Concepts RoPE, Flash Attention, GQA
Quantization
- Quant Fundamentals Block formats, grouping
- Bit Manipulation Visuals Q5_0, Q4_K, INT8 with spaced repetition
- Quant Format Reference Byte-level visualization
- GGUF to Bump Weight conversion
- GGUF Parsing Byte-level guide
Optimization
- GEMM Memory Layout NN/NT layouts, offsets
- GEMM Optimization AVX, MKL, blocking
- v7 Train Layout + Dispatch IR3 memory + parallel execution plan
- Threadpool GEMM Playbook Split M/N/K policy for training
- SIMD Architecture AVX-512, VNNI, AMX
- Flash Attention Analysis Why llama.cpp is faster
Infrastructure
- Memory Safety Bump allocator, canaries
- Deterministic Memory RDMA, interpretability
- Profiling Guide Valgrind, perf, flamegraphs
- v8 Qwen3-VL Runbook Validated decoder/mmproj workflow for multimodal inference
- v8 Vision Encoder Architecture Bridge and encoder design notes for the current multimodal inference lane
- v7 Profiling Runbook VTune + Advisor + perf/flamegraph on train kernels
- v7 Inference + Training Runbook Operational workflow from dataset to chat output
- v7 Python Authoring Guide
Notebook-driven and module-driven authoring entrypoints for the same
v7runtime - Testing Numerical parity verification
Temp / Work in Progress
These pages are work-in-progress and may be moved or updated.
Quantization Math Deep Dive
Explains Q5_0/Q8_0 block formats, dequantization math, and AVX-512 vectorization strategy.
Read MoreGEMM Memory Layout
Covers quantized block storage, cache blocking strategies, and KV cache layouts.
Read MoreQuick Navigation
By Task
| Task | Documentation |
|---|---|
| Understanding the system | System Overview, Concepts, v7 Backprop IR, v7 CE Parity, v7 Grad-Accum Windows, v7 Runbook, v7 Python Authoring Guide |
| Implementing new kernels | Kernel Reference, Gated DeltaNet Deep Dive, Codegen |
| Quantization work | Quant Fundamentals, Bit Visuals, GGUF Parsing |
| Performance optimization | GEMM Layout, v7 Train Layout+Dispatch, Threadpool Playbook, SIMD |
| Debugging & profiling | Profiling, v7 Profiling Runbook, Testing, v7 CE Parity Deep Dive, v7 Runtime Stitch Graph, v7 Runbook, v7 Python Authoring Guide |
| Operator train + compute workflow | v7 SVG Dataset Runbook, v7 Inference + Training Runbook, v7 Python Authoring Guide, v7 Profiling Runbook |