CK-native CPU-first Version-integrated roadmap

Training Curriculum

CK-Native Training Curriculum

This is the training plan that connects v7 to the rest of the roadmap. The point is not to learn one narrow SVG task. The point is to use C-Kernel-Engine to learn the whole stack: contracts, data, codegen, replay, distributed execution, multimodal expansion, and eventually embedded control-grade AI.

Core framing: v7 is the foundation track, not a side quest. It makes training correct, observable, and repeatable. Later versions are multipliers on top of that: v8-v9 add sparse scale, v10 adds adapters, v11-v14 add modalities, v15 lands embedded inference, v16 explains model behavior, and v17 closes the loop with real-time policy integration.

Current v7 branch decision: after the frozen spec16 winner, bounded-intent work passed through spec17 and spec18. The next recommended branch is spec19: a textbook-routing mixture with named buckets, stronger minimal-pair coverage, and capacity treated as a separate fallback step rather than a hidden curriculum tweak. Open spec19-textbook-routing-mixture.html.

How v7 and Beyond Integrate

The version history is not a list of disconnected features. It is a dependency chain. Each version teaches a new operating skill and depends on the contracts from earlier stages.

Correctness

Training IR, backward lowering, replay determinism, checkpoints, run artifacts, and operator-grade gates.

v8-v9

Scale + Sparsity

MoE routing, expert execution, sparse backward, and eventually CK-native distributed training.

v10

Adaptation

LoRA and QLoRA let the project specialize downstream models without retraining the full stack.

v11-v14

Modalities

Vision and audio force the engine to handle new front-ends, new data contracts, and new evals.

v15

Deployment

Embedded inference turns the training work into bounded, portable, hardware-aware systems.

v16-v17

Interpret + Control

Mechanistic interpretability and multimodal control make the model explainable enough to trust in loops.

Version-to-Skill Matrix

Version Track	Built Capability	Core Skills	Hard Exit Gate
`v7.0-v7.2`	Single-rank training core, IR-driven backward, compiler-backed task curricula, deterministic replay, threaded training runtime.	Optimizer math, codegen/runtime contracts, data packing, token budgets, parity triage, profiler discipline.	Runs are reproducible, replay is stable, checkpoints resume cleanly, and operator gates pass from a clean checkout.
`v8.0-v9.0`	MoE forward/backward plus the first serious CK-native distributed training stack.	Routing, sparse load balance, all-reduce behavior, sharded data loading, multi-rank failure modes.	`1-rank` and `N-rank` tiny runs agree within declared tolerance and distributed checkpoints restore correctly.
`v10.0`	LoRA/QLoRA adapters injected into the same IR/codegen path.	Parameter-efficient fine-tuning, frozen-base training, low-rank update accounting, adapter promotion policy.	Adapter-only runs match the declared update surface and inference promotion works without full-model surgery.
`v11.0-v14.0`	Vision and audio encoders plus their backward paths and modality-specific datasets.	Patching/tokenization for new modalities, encoder contracts, multimodal batching, modality-specific eval design.	Each modality has a deterministic preflight, stable probes, and no regression to the text-only path.
`v15.0`	Embedded inference runtime with deterministic memory and HAL integration.	Latency budgets, memory ceilings, real-time constraints, portable runtime surfaces, deployment sign-off.	Embedded targets hit bounded memory/latency envelopes and keep deterministic runtime behavior.
`v16.0-v17.0`	Interpretability tooling, feature tracing, causal intervention, and policy integration with vision/control loops.	Activation tracing, SAE-style feature learning, intervention experiments, closed-loop safety reasoning.	Model behavior can be inspected, perturbed, and validated before it is allowed into autonomy loops.

12-Month Learning Ladder

A good first project year is not “buy a cluster and hope.” It is staged competence: first correctness, then representation, then scale, then modality expansion, then deployment-grade systems.

Months 1-2

Phase 1: Engine Contracts

Master the v7 gates before chasing capability expansion.

Run IR build, compile smoke, parity, replay, and checkpoint flows until the failure classes are explainable and repeatable.
Learn how run directories, manifests, token packing, and visualizer artifacts fit together.
Treat the oracle as a measuring tool, not a permanent training dependency.

Months 3-4

Phase 2: Representation Learning

Use compiler-backed datasets to learn data shaping and evaluation.

Push scene DSL work from structured SVG toward richer compiler-owned visual vocabularies.
Treat compiler fidelity on a gold asset pack as a hard gate before serious training.
Make token granularity its own explicit spec step after structure/content separation is proven.
Add negative correction pairs, not just positive rows, so the model learns how to recover from wrong-but-parseable scenes.
Add page-level DSL work that can later lower into Databoard or other structured web surfaces.
Make preflight, canary, and non-regression gates mandatory.

Months 5-6

Phase 3: Code and Data Tasks

Leave “visual only” and start teaching transformation.

Train on config transforms, schema mapping, patch IRs, route/controller/model tasks, and build-file edits.
Broaden into C, C++, Python, Bash, Awk, Lua, PHP, HTML/CSS/JS.
Evaluate with validators, build success, and tests instead of loss alone.

Months 7-8

Phase 4: CK-Native Distribution

Build the training system the project can own end-to-end.

Start with 1 host and many ranks before moving to many hosts.
Implement rank/world bootstrap, deterministic sharded sampling, collective ops, and distributed checkpoints.
Prove 1-rank vs N-rank equivalence on tiny runs before attempting long jobs.

Months 9-10

Phase 5: MoE and Adapters

Only scale the model family once the core runtime and gates are stable.

Land v8-v9 sparse expert routing and backward.
Use v10 adapters to learn specialization without full retraining cost.
Track exact ownership of trainable parameters and promotion paths.

Months 11-12

Phase 6: Multimodal to Embedded

Make the system useful in concrete downstream domains.

Bridge into vision/audio training tracks from v11-v14.
Add executable tasks from controlSystems, stateEstimation, inertial_navigation_system, and AeroDynControlRig.
Use that work to prepare the jump to v15-v17.

Domain Ladder Inside v7

Track	Primary Target	What It Teaches	Best Evidence
Visual DSL	Compiler-backed SVG scene contracts	Representation design, slot structure, compiler ownership, token-granularity design, negative repair curriculum, renderable probes	Exact DSL match, materialized SVG match, renderability, content binding success, gold-asset parity, family non-regression
Page DSL	Structured page scenes that can lower into web/template systems	Section composition, variant selection, semantic layout, validator-driven generation	Schema pass, section correctness, stable page render
Code/Data IR	Config transforms, patch IR, schema mapping, route/controller/model tasks	Machine-checkable code transformation, exactness, build discipline	Validator pass, diff quality, compile/test success
Direct Code	Real repo tasks across `C/C++/Python/Bash/Awk/Lua/PHP/JS/CSS`	Language fluency, multi-file editing, debugging, refactor quality	Tests passing, build success, review findings shrinking over time
Tool Use	Search, inspect, patch, rerun, compare, retry	Engineering behavior instead of plain text generation	Task success rate, fix rate, retry efficiency
Math + Embedded	controlSystems, stateEstimation, inertial_navigation_system, AeroDynControlRig, and hardware-facing utilities	Executable scientific reasoning, tolerances, systems thinking	Numerical correctness, simulation stability, embedded test pass

Reference Repositories

Public curriculum references should point to concrete repositories instead of private local shorthand. In particular, the local DroneMath workspace is a multi-repo family, so public docs should name the exact downstream repositories.

C-Kernel-Engine

The core runtime, codegen, tokenizer, IR, training, profiling, and distributed systems project.

antsand.com

The web stack and Databoard host repository for page-DSL, templates, and style-variant integration work.

LinuxUtilities

Shell, system, and operator-facing tasks for tool-use, scripting, config repair, and deployment automation.

controlSystems

Control-theory code for executable math, controllers, tuning tasks, and embedded-facing numerical evaluation.

stateEstimation

Filtering, sensor fusion, and estimator tasks for numerical correctness, drift analysis, and systems reasoning.

inertial_navigation_system

Navigation pipelines and inertial reasoning tasks that bridge math, simulation, and robotics deployment.

AeroDynControlRig

Flight-dynamics and control-loop work for higher-consequence embedded and autonomy evaluation.

Before Investing in Serious In-House Server-Grade Distributed Computing Infrastructure

Serious in-house server-grade distributed computing infrastructure is useful when it multiplies a stable loop. It is a waste when it magnifies confusion. Invest only after the single-node and small-multi-rank gates are boring.

Must Already Be True

Single-rank CK training is deterministic enough to replay and debug.
Run directories are standardized and artifact-rich.
Preflight, canary, and non-regression gates catch bad runs early.
Checkpoint/resume is reliable and documented.

Distributed Bring-Up Gates

1-rank and 2-rank tiny runs match within declared tolerance.
Sharded data reading preserves exact token accounting.
Collectives behave deterministically enough to debug regressions.
Per-rank logs and checkpoints are inspectable without guesswork.

Only Then Scale Spend

Use the first cluster months for parallel experiments and validation, not giant speculative runs.
Benchmark experiments/day, not just tokens/sec.
Delay large-model pushes until representation and compiler surfaces are stable.
Keep cluster rollout tied to concrete gates instead of hope.

The Operator Loop to Internalize

Define the contract

Make the target explicit enough to evaluate exactly and debug cheaply.

Build the data

Materialize rows, token budgets, holdouts, and canaries before touching model size.

Check token granularity

Use explicit structural tokens early if the contract is tiny and formal, but treat whole component-row tokens as a temporary control tactic. Once the DSL stabilizes, run a dedicated token-granularity step that breaks them into a smaller compositional grammar.

Run small first

Canary the format, verify render/build/test behavior, then launch the real run.

Inspect real evidence

Use probes, replay, checkpoints, gold-asset compiler reports, diffs, and profiler output. Do not trust loss alone.

Repair the right layer

Fix representation, data, compiler, runtime, or capacity based on evidence, not instinct. Add richer negative rows when the model is parseable but semantically wrong.

Scale only after closure

Once the loop is boring, then spend compute on more ranks, more models, or more modalities.