SSE-optimized GEMM kernels for Q5_0 x Q8_K quantization. More...
#include <immintrin.h>#include <stdint.h>#include <string.h>#include <stdio.h>#include "ckernel_quant.h"Go to the source code of this file.
Functions | |
| static float | dot_q5_0_q8_k_32_sse (const block_q5_0 *bw, const block_q8_K *ba, int q8_offset) |
| void | gemm_nt_q5_0_ref (const float *A, const void *B, const float *bias, float *C, int M, int N, int K) |
| GEMM with transposed Q5_0 weights: C = A @ B^T. More... | |
| void | gemm_nt_q5_0_sse_v2 (const float *A, const void *B, const float *bias, float *C, int M, int N, int K) |
| void | quantize_row_q8_k (const float *x, void *vy, int k) |
SSE-optimized GEMM kernels for Q5_0 x Q8_K quantization.
After changes: make test && make llamacpp-parity-full
Definition in file gemm_kernels_q5_0_sse_v2.c.
|
inlinestatic |
Definition at line 25 of file gemm_kernels_q5_0_sse_v2.c.
References block_q8_K::bsums, CK_FP16_TO_FP32, block_q5_0::d, block_q8_K::d, block_q5_0::qh, block_q8_K::qs, and block_q5_0::qs.
Referenced by gemm_nt_q5_0_sse_v2().
| void gemm_nt_q5_0_ref | ( | const float * | A, |
| const void * | B, | ||
| const float * | bias, | ||
| float * | C, | ||
| int | M, | ||
| int | N, | ||
| int | K | ||
| ) |
GEMM with transposed Q5_0 weights: C = A @ B^T.
| A | Input activations [M x K], row-major FP32 |
| B | Weight matrix in Q5_0 format [N x K], row-major quantized |
| bias | Optional bias [N], NULL if not used |
| C | Output [M x N], row-major FP32 |
| M | Batch size (number of tokens) |
| N | Output dimension (number of rows in B) |
| K | Input dimension |
Definition at line 788 of file gemm_kernels_q5_0.c.
References C, CK_FP16_TO_FP32, block_q5_0::d, block_q5_0::qh, QK5_0, and block_q5_0::qs.
Referenced by gemm_nt_q5_0_sse_v2().
| void gemm_nt_q5_0_sse_v2 | ( | const float * | A, |
| const void * | B, | ||
| const float * | bias, | ||
| float * | C, | ||
| int | M, | ||
| int | N, | ||
| int | K | ||
| ) |
Definition at line 77 of file gemm_kernels_q5_0_sse_v2.c.
References C, dot_q5_0_q8_k_32_sse(), gemm_nt_q5_0_ref(), QK_K, and quantize_row_q8_k().
| void quantize_row_q8_k | ( | const float * | x, |
| void * | vy, | ||
| int | k | ||
| ) |
Definition at line 107 of file gemm_kernels_q4k_q8k.c.
Referenced by gemm_nt_q5_0_sse_v2().