GEMM/GEMV kernels with Q5_K quantized weights. More...
Go to the source code of this file.
Macros | |
| #define | QK_K 256 |
Functions | |
| void | gemm_nt_q5_k (const float *A, const void *B, const float *bias, float *C, int M, int N, int K) |
| void | gemm_nt_q5_k_ref (const float *A, const void *B, const float *bias, float *C, int M, int N, int K) |
| void | gemv_q5_k (float *y, const void *W, const float *x, int M, int K) |
| void | gemv_q5_k_ref (float *y, const void *W, const float *x, int M, int K) |
| static void | get_q5_k_scale_min (int j, const uint8_t *scales, uint8_t *scale, uint8_t *min) |
GEMM/GEMV kernels with Q5_K quantized weights.
After changes: make test && make llamacpp-parity-full
Implements matrix multiplication where:
Q5_K Format (256 weights per super-block):
Total: 2 + 2 + 12 + 32 + 128 = 176 bytes per 256 weights = 5.5 bits/weight
Dequantization formula (matches llama.cpp): w = d * (scale/64) * q - dmin * (mins/64) where q = qs_val | (qh_bit << 4) = 5-bit value [0, 31]
Definition in file gemm_kernels_q5_k.c.
| #define QK_K 256 |
Definition at line 44 of file gemm_kernels_q5_k.c.
| void gemm_nt_q5_k | ( | const float * | A, |
| const void * | B, | ||
| const float * | bias, | ||
| float * | C, | ||
| int | M, | ||
| int | N, | ||
| int | K | ||
| ) |
Definition at line 218 of file gemm_kernels_q5_k.c.
References C, and gemm_nt_q5_k_ref().
Referenced by ck_test_gemm_q5_k().
| void gemm_nt_q5_k_ref | ( | const float * | A, |
| const void * | B, | ||
| const float * | bias, | ||
| float * | C, | ||
| int | M, | ||
| int | N, | ||
| int | K | ||
| ) |
Definition at line 145 of file gemm_kernels_q5_k.c.
References C, CK_FP16_TO_FP32, block_q5_K::d, block_q5_K::dmin, get_q5_k_scale_min(), block_q5_K::qh, QK_K, block_q5_K::qs, and block_q5_K::scales.
Referenced by gemm_nt_q5_k().
| void gemv_q5_k | ( | float * | y, |
| const void * | W, | ||
| const float * | x, | ||
| int | M, | ||
| int | K | ||
| ) |
Definition at line 199 of file gemm_kernels_q5_k.c.
References gemv_q5_k_ref().
Referenced by ck_test_gemv_q5_k().
| void gemv_q5_k_ref | ( | float * | y, |
| const void * | W, | ||
| const float * | x, | ||
| int | M, | ||
| int | K | ||
| ) |
Definition at line 92 of file gemm_kernels_q5_k.c.
References CK_FP16_TO_FP32, block_q5_K::d, block_q5_K::dmin, get_q5_k_scale_min(), block_q5_K::qh, QK_K, block_q5_K::qs, and block_q5_K::scales.
Referenced by gemv_q5_k().
|
inlinestatic |
Definition at line 74 of file gemm_kernels_q5_k.c.
Referenced by gemm_nt_q5_k_ref(), and gemv_q5_k_ref().