GEMM/GEMV kernels with Q6_K quantized weights. More...
Go to the source code of this file.
Functions | |
| static float | dot_q6_k_ref (const block_q6_K *w, const float *x, int K) |
| void | gemm_nt_q6_k (const float *A, const void *B, const float *bias, float *C, int M, int N, int K) |
| void | gemm_nt_q6_k_ref (const float *A, const void *B, const float *bias, float *C, int M, int N, int K) |
| void | gemm_q6_k (float *Y, const void *W, const float *X, int M, int N, int K) |
| void | gemv_q6_k (float *y, const void *W, const float *x, int M, int K) |
GEMM/GEMV kernels with Q6_K quantized weights.
After changes: make test && make llamacpp-parity-full
Implements matrix multiplication where:
Q6_K Format (256 weights per block):
Definition in file gemm_kernels_q6k.c.
|
static |
Definition at line 39 of file gemm_kernels_q6k.c.
References block_q6_K::d, GGML_FP16_TO_FP32, block_q6_K::qh, QK_K, block_q6_K::ql, and block_q6_K::scales.
Referenced by gemv_q6_k().
| void gemm_nt_q6_k | ( | const float * | A, |
| const void * | B, | ||
| const float * | bias, | ||
| float * | C, | ||
| int | M, | ||
| int | N, | ||
| int | K | ||
| ) |
Definition at line 212 of file gemm_kernels_q6k.c.
References C, and gemm_q6_k().
Referenced by ck_gemm_nt_quant(), gemm_nt_q6_k_ref(), qwen2_0_5b_decode_layer_0_decode(), qwen2_0_5b_decode_layer_10_decode(), qwen2_0_5b_decode_layer_13_decode(), qwen2_0_5b_decode_layer_16_decode(), qwen2_0_5b_decode_layer_19_decode(), qwen2_0_5b_decode_layer_1_decode(), qwen2_0_5b_decode_layer_21_decode(), qwen2_0_5b_decode_layer_3_decode(), qwen2_0_5b_decode_layer_6_decode(), qwen2_0_5b_decode_layer_7_decode(), qwen2_0_5b_decode_layer_8_decode(), and qwen2_0_5b_decode_layer_9_decode().
| void gemm_nt_q6_k_ref | ( | const float * | A, |
| const void * | B, | ||
| const float * | bias, | ||
| float * | C, | ||
| int | M, | ||
| int | N, | ||
| int | K | ||
| ) |
Definition at line 243 of file gemm_kernels_q6k.c.
References C, and gemm_nt_q6_k().
Referenced by gemm_nt_q6_k_sse().
| void gemm_q6_k | ( | float * | Y, |
| const void * | W, | ||
| const float * | X, | ||
| int | M, | ||
| int | N, | ||
| int | K | ||
| ) |
Definition at line 195 of file gemm_kernels_q6k.c.
References gemv_q6_k().
Referenced by gemm_nt_q6_k().
| void gemv_q6_k | ( | float * | y, |
| const void * | W, | ||
| const float * | x, | ||
| int | M, | ||
| int | K | ||
| ) |
Definition at line 169 of file gemm_kernels_q6k.c.
References dot_q6_k_ref(), and QK_K.
Referenced by gemm_q6_k().