AVX2 Q4_K x Q8_K matvec kernel (inference only) More...

#include <stddef.h>
#include <stdint.h>
#include <string.h>
#include "ckernel_quant.h"

Functions
void	gemv_q4_k_q8_k_avx2 (float y, const void W, const void *x_q8, int M, int K)

void	gemv_q4_k_q8_k_ref (float y, const void W, const void *x_q8, int M, int K)

Detailed Description

AVX2 Q4_K x Q8_K matvec kernel (inference only)

CK-ENGINE KERNEL RULES:

NO malloc/free - memory via bump allocator, pointers passed in
NO OpenMP - parallelization at orchestrator/codegen layer
API must define: inputs, outputs, workspace, and memory layouts
Pure computation - deterministic, no side effects

After changes: make test && make llamacpp-parity-full

Requires AVX2 for 256-bit integer operations.

Definition in file gemm_kernels_q4k_q8k_avx2.c.

Function Documentation

◆ gemv_q4_k_q8_k_avx2()

void gemv_q4_k_q8_k_avx2	(	float *	y,
		const void *	W,
		const void *	x_q8,
		int	M,
		int	K
	)

Definition at line 89 of file gemm_kernels_q4k_q8k_avx2.c.

 {
     /* TODO: Implement AVX2 version with correct Q4_K memory layout.
      * For now, fall back to reference implementation which has been
      * fixed to use the correct layout.
      */
     gemv_q4_k_q8_k_ref(y, W, x_q8, M, K);
 }

References gemv_q4_k_q8_k_ref().

Referenced by gemv_q4_k_q8_k(), and gemv_q4_k_q8_k_amx().

◆ gemv_q4_k_q8_k_ref()

void gemv_q4_k_q8_k_ref	(	float *	y,
		const void *	W,
		const void *	x_q8,
		int	M,
		int	K
	)

Definition at line 177 of file gemm_kernels_q4k_q8k.c.

 {
     if (!y || !W || !x_q8 || M <= 0 || K <= 0) {
         return;
     }
  
     const block_q4_K *blocks = (const block_q4_K *)W;
     const block_q8_K *x = (const block_q8_K *)x_q8;
     const int blocks_per_row = K / QK_K;
  
     for (int row = 0; row < M; ++row) {
         const block_q4_K *w_row = blocks + (size_t)row * (size_t)blocks_per_row;
         y[row] = dot_q4_k_q8_k_ref(w_row, x, K);
     }
 }

Referenced by gemv_q4_k_q8_k_avx2().

Functions

Detailed Description

CK-ENGINE KERNEL RULES:

Function Documentation

◆ gemv_q4_k_q8_k_avx2()

◆ gemv_q4_k_q8_k_ref()