SSE-optimized GEMM kernels for Q6_K x Q8_K quantization. More...
Go to the source code of this file.
Functions | |
| static float | dot_q6_k_q8_k_256_sse (const block_q6_K *bw, const block_q8_K *ba) |
| void | gemm_nt_q6_k_sse (const float *A, const void *B, const float *bias, float *C, int M, int N, int K) |
| void | quantize_row_q8_k (const float *x, void *vy, int k) |
SSE-optimized GEMM kernels for Q6_K x Q8_K quantization.
After changes: make test && make llamacpp-parity-full
Definition in file gemm_kernels_q6k_sse.c.
|
inlinestatic |
SSE Optimized dot product for Q6_K x Q8_K Q6_K layout: ql: 128 bytes (low 4 bits) qh: 64 bytes (high 2 bits) scales: 16 bytes (int8 scales) d: fp16 super-scale
Definition at line 33 of file gemm_kernels_q6k_sse.c.
References CK_FP16_TO_FP32, block_q6_K::d, block_q8_K::d, block_q6_K::qh, QK_K, block_q6_K::ql, block_q8_K::qs, and block_q6_K::scales.
Referenced by gemm_nt_q6_k_sse().
| void gemm_nt_q6_k_sse | ( | const float * | A, |
| const void * | B, | ||
| const float * | bias, | ||
| float * | C, | ||
| int | M, | ||
| int | N, | ||
| int | K | ||
| ) |
Definition at line 66 of file gemm_kernels_q6k_sse.c.
References C, dot_q6_k_q8_k_256_sse(), gemm_nt_q6_k_ref(), QK_K, and quantize_row_q8_k().
| void quantize_row_q8_k | ( | const float * | x, |
| void * | vy, | ||
| int | k | ||
| ) |
Definition at line 107 of file gemm_kernels_q4k_q8k.c.
Referenced by gemm_nt_q6_k_sse().