← Back to C-Kernel-Engine Docs Doxygen Source Documentation
gemm_kernels_q4k_q8k_avx2.c File Reference

AVX2 Q4_K x Q8_K matvec kernel (inference only) More...

#include <stddef.h>
#include <stdint.h>
#include <string.h>
#include "ckernel_quant.h"

Go to the source code of this file.

Functions

void gemv_q4_k_q8_k_avx2 (float *y, const void *W, const void *x_q8, int M, int K)
 
void gemv_q4_k_q8_k_ref (float *y, const void *W, const void *x_q8, int M, int K)
 

Detailed Description

AVX2 Q4_K x Q8_K matvec kernel (inference only)

CK-ENGINE KERNEL RULES:

  1. NO malloc/free - memory via bump allocator, pointers passed in
  2. NO OpenMP - parallelization at orchestrator/codegen layer
  3. API must define: inputs, outputs, workspace, and memory layouts
  4. Pure computation - deterministic, no side effects

After changes: make test && make llamacpp-parity-full

Requires AVX2 for 256-bit integer operations.

Definition in file gemm_kernels_q4k_q8k_avx2.c.

Function Documentation

◆ gemv_q4_k_q8_k_avx2()

void gemv_q4_k_q8_k_avx2 ( float *  y,
const void *  W,
const void *  x_q8,
int  M,
int  K 
)

Definition at line 89 of file gemm_kernels_q4k_q8k_avx2.c.

93 {
94  /* TODO: Implement AVX2 version with correct Q4_K memory layout.
95  * For now, fall back to reference implementation which has been
96  * fixed to use the correct layout.
97  */
98  gemv_q4_k_q8_k_ref(y, W, x_q8, M, K);
99 }
void gemv_q4_k_q8_k_ref(float *y, const void *W, const void *x_q8, int M, int K)

References gemv_q4_k_q8_k_ref().

Referenced by gemv_q4_k_q8_k(), and gemv_q4_k_q8_k_amx().

◆ gemv_q4_k_q8_k_ref()

void gemv_q4_k_q8_k_ref ( float *  y,
const void *  W,
const void *  x_q8,
int  M,
int  K 
)

Definition at line 177 of file gemm_kernels_q4k_q8k.c.

181 {
182  if (!y || !W || !x_q8 || M <= 0 || K <= 0) {
183  return;
184  }
185 
186  const block_q4_K *blocks = (const block_q4_K *)W;
187  const block_q8_K *x = (const block_q8_K *)x_q8;
188  const int blocks_per_row = K / QK_K;
189 
190  for (int row = 0; row < M; ++row) {
191  const block_q4_K *w_row = blocks + (size_t)row * (size_t)blocks_per_row;
192  y[row] = dot_q4_k_q8_k_ref(w_row, x, K);
193  }
194 }
#define QK_K
static float dot_q4_k_q8_k_ref(const block_q4_K *w, const block_q8_K *x, int k)

Referenced by gemv_q4_k_q8_k_avx2().