System Architecture

C-Kernel-Engine uses a three-stage pipeline to transform model configurations into optimized C runtimes.

New: See the IR Pipeline v6.6 page for end-to-end visuals of templates, IR1/IR2, lowering, memory layout, and dumb codegen.

The "Website" Metaphor

The engine treats LLMs like a website generator treats pages. This allows us to unroll the "Block" section efficiently in C without complex control flow.

Section Website LLM
Header <head>, Nav, CSS Embeddings, Positional Encoding
Block Blog Posts, Articles Transformer Layers (repeated)
Footer Copyright, Scripts Final Norm, Language Head
Architecture Overview

Operator Spectrum Map

This view connects the data path (pretrain to RLHF/GRPO) with the compute path (math to inference/backprop) so operators can reason about the full stack in one place.

Click the map to open in the dark in-site SVG viewer.
Download SVG
Operator Spectrum Map

Data Runbook Links

Computation Runbook Links

Stage 1: Model Configuration

The engine accepts HuggingFace-style config.json files as input:

{
  "hidden_size": 768,
  "num_attention_heads": 12,
  "num_key_value_heads": 4,
  "num_hidden_layers": 6,
  "intermediate_size": 2048,
  "rms_norm_eps": 1e-5,
  "rope_theta": 10000.0
}

This defines all the dimensions needed to generate layer structures.

Stage 2: Intermediate Representation

The IR Builder creates a structured representation of each layer:

CKLayerIR Structure

typedef struct {
    int layer_index;
    int embed_dim;
    int num_heads;
    int num_kv_heads;
    int head_dim;
    int intermediate_dim;
    int context_window;
    float eps;
    float rope_theta;
} CKLayerIR;

Key Decisions

  • Aligned dimensions: Head dim padded to cache-friendly sizes
  • GQA ratio: Computed from num_heads / num_kv_heads
  • Buffer sizing: Calculated for all intermediate activations

Stage 3: Code Generation

The codegen emits complete C functions for forward and backward passes:

Generated Forward Pass

void forward_layer_0(
    const float *input,
    const ModelWeights *weights,
    LayerActivations *acts,
    const float *cos_cache,
    const float *sin_cache,
    int num_tokens
) {
    // 1. Pre-attention RMSNorm
    rmsnorm_forward(input, weights->ln1_gamma, acts->ln1_out, ...);

    // 2. QKV projection
    ck_qkv_project_head_major(acts->ln1_out, weights->wq, ...);

    // 3. Apply RoPE
    rope_forward_qk(acts->q, acts->k, cos_cache, sin_cache, ...);

    // 4. Attention
    attention_forward_causal_head_major_gqa(acts->q, acts->k, acts->v, ...);

    // 5. Output projection + residual
    // 6. Post-attention RMSNorm
    // 7. MLP (SwiGLU)
    // 8. Final residual
}

Generated Backward Pass

void backward_layer_0(
    const float *d_output,
    const ModelWeights *weights,
    const LayerActivations *acts,
    WeightGradients *grads,
    float *d_input
) {
    // Reverse order of forward pass
    // Each kernel uses saved activations from forward

    // 1. Backward through final residual
    // 2. Backward through MLP (SwiGLU)
    // 3. Backward through RMSNorm 2
    // 4. Backward through attention output projection
    // 5. Backward through attention
    attention_backward_causal_head_major_gqa(d_attn_out, acts->q, ...);

    // 6. Backward through RoPE (inverse rotation)
    rope_backward_qk(d_q, d_k, ...);

    // 7. Backward through QKV projection
    // 8. Backward through RMSNorm 1
}

Memory Layout

Head-Major Layout
Q/K/V use [num_heads, num_tokens, head_dim] layout for cache-efficient attention computation.
Buffer Layout Size
input [B, T, D] batch * tokens * embed_dim
Q [H, T, d_k] num_heads * tokens * head_dim
K, V [H_kv, T, d_k] num_kv_heads * tokens * head_dim
scores [H, T, T] num_heads * tokens * context_window
mlp_hidden [T, 2*I] tokens * 2 * intermediate_dim

Kernel Composition

Kernels are composed following transformer layer structure:

Forward and Backward Data Flow

Build System

Full Library

make

Builds libckernel_engine.so with all kernels linked together.

Per-Kernel Libraries

make libckernel_attention.so
make libckernel_rope.so
make libckernel_rmsnorm.so

Builds individual kernel libraries for testing.

Codegen Pipeline

Generate Runtime from Config

# Build the IR demo tool
make build/ck_ir_demo

# Generate C runtime
./build/ck_ir_demo config.json --emit build/model.c

# Or use the make target
make ck-emit CONFIG=config.json OUT=build/model.c

The generated file contains:

Project Structure

The codebase is organized for easy navigation:

Focused Source Tree src/kernels · version/v6.6 · version/v7 Updated: 2026-04-13 05:46
src/kernels
`-- fused
version/v6.6
|-- docs
|-- include
|-- kernel_maps
|-- patches
|-- scripts
|   `-- parity
|-- src
|   |-- generated
|   |-- kernel_config
|   |-- scripts
|   `-- test_generated
|-- templates
|-- test
|-- testing
|-- tests
|-- tools
`-- unittest
version/v7
|-- artifacts
|   `-- svg_dsl
|       |-- gen1_archive_2026-04-05
|       `-- spec_archive_2026-04-08
|-- contracts
|-- data
|   |-- eval_contracts
|   |-- generated
|   |   |-- toy_svg_semantic_shapes_tokenizer
|   |   `-- toy_svg_structured_atoms_tokenizer
|   |-- probe_contracts
|   |-- spec03
|   |   |-- contracts
|   |   |-- holdout
|   |   |-- manifests
|   |   |-- midtrain
|   |   |-- normalized
|   |   |-- pretrain
|   |   |-- raw_assets
|   |   |-- sft
|   |   `-- tokenizer
|   `-- spec04
|       |-- contracts
|       |-- holdout
|       |-- manifests
|       |-- midtrain
|       |-- normalized
|       |-- pretrain
|       |-- raw_assets
|       |-- sft
|       `-- tokenizer
|-- docs
|-- examples
|-- experiments
|   `-- svg_dsl
|       |-- catalog
|       |-- core
|       |-- programs
|       `-- renderers
|-- include
|-- kernel_maps
|-- regression
|-- reports
|   |-- spec12_gold_mappings -> ../artifacts/svg_dsl/spec_archive_2026-04-08/spec12_gold_mappings
|   |-- spec13b_gold_mappings -> ../artifacts/svg_dsl/spec_archive_2026-04-08/spec13b_gold_mappings
|   |-- spec14a_gold_mappings -> ../artifacts/svg_dsl/spec_archive_2026-04-08/spec14a_gold_mappings
|   |-- spec14b_gold_mappings -> ../artifacts/svg_dsl/spec_archive_2026-04-08/spec14b_gold_mappings
|   |-- spec15a_gold_mappings -> ../artifacts/svg_dsl/spec_archive_2026-04-08/spec15a_gold_mappings
|   |-- spec15b_gold_mappings -> ../artifacts/svg_dsl/spec_archive_2026-04-08/spec15b_gold_mappings
|   |-- spec_broader_1_family_packs -> ../artifacts/svg_dsl/spec_archive_2026-04-08/spec_broader_1_family_packs
|   `-- spec_broader_1_gold_mappings -> ../artifacts/svg_dsl/spec_archive_2026-04-08/spec_broader_1_gold_mappings
|-- runs
|   |-- logs
|   `-- overnight_monitor
|       `-- spec10
|-- scripts
|   |-- dataset
|   `-- parity
|-- src
|-- templates
|-- test
|-- tests
|   |-- contracts
|   `-- fixtures
`-- tools
    `-- src

87 directories
Image
100% | |
Scroll to zoom | Drag to pan | W/H to fit | 0 to reset | ESC to close