Fused GEMV kernels with online quantization and bias. More...
#include <stdint.h>#include <stddef.h>#include <string.h>#include <math.h>#include "ckernel_quant.h"Go to the source code of this file.
Functions | |
| static int | ck_round_nearest (float v) |
| Round to nearest int, half away from zero (matches quantize_row_q8_0) More... | |
| static float | dot_fp32_q5_0_block (const float *x, const block_q5_0 *block) |
| Compute dot product of FP32 input with Q5_0 weight block, with online Q8 quantization. More... | |
| static float | dot_fp32_q8_0_block (const float *x, const block_q8_0 *block) |
| Compute dot product of FP32 input with Q8_0 weight block, with online Q8 quantization. More... | |
| void | gemv_fused_q5_0_bias (float *y, const void *W, const float *x, const float *bias, int M, int K) |
| void | gemv_fused_q5_0_bias_dispatch (float *y, const void *W, const float *x, const float *bias, int M, int K) |
| void | gemv_fused_q8_0_bias (float *y, const void *W, const float *x, const float *bias, int M, int K) |
| void | gemv_fused_q8_0_bias_dispatch (float *y, const void *W, const float *x, const float *bias, int M, int K) |
Fused GEMV kernels with online quantization and bias.
These kernels fuse:
Benefits:
Kernel signature: gemv_fused_q5_0_bias(y, W, x, bias, M, K)
Definition in file gemv_fused_quant_bias.c.
|
inlinestatic |
Round to nearest int, half away from zero (matches quantize_row_q8_0)
Definition at line 40 of file gemv_fused_quant_bias.c.
Referenced by dot_fp32_q5_0_block(), and dot_fp32_q8_0_block().
|
inlinestatic |
Compute dot product of FP32 input with Q5_0 weight block, with online Q8 quantization.
Definition at line 375 of file gemv_fused_quant_bias.c.
References CK_FP16_TO_FP32, CK_FP32_TO_FP16, ck_round_nearest(), block_q5_0::d, block_q5_0::qh, and block_q5_0::qs.
Referenced by gemv_fused_q5_0_bias().
|
inlinestatic |
Compute dot product of FP32 input with Q8_0 weight block, with online Q8 quantization.
Definition at line 418 of file gemv_fused_quant_bias.c.
References CK_FP16_TO_FP32, CK_FP32_TO_FP16, ck_round_nearest(), block_q8_0::d, and block_q8_0::qs.
Referenced by gemv_fused_q8_0_bias().
| void gemv_fused_q5_0_bias | ( | float * | y, |
| const void * | W, | ||
| const float * | x, | ||
| const float * | bias, | ||
| int | M, | ||
| int | K | ||
| ) |
Definition at line 448 of file gemv_fused_quant_bias.c.
References dot_fp32_q5_0_block(), and QK5_0.
Referenced by gemv_fused_q5_0_bias_dispatch().
| void gemv_fused_q5_0_bias_dispatch | ( | float * | y, |
| const void * | W, | ||
| const float * | x, | ||
| const float * | bias, | ||
| int | M, | ||
| int | K | ||
| ) |
Definition at line 508 of file gemv_fused_quant_bias.c.
References gemv_fused_q5_0_bias().
| void gemv_fused_q8_0_bias | ( | float * | y, |
| const void * | W, | ||
| const float * | x, | ||
| const float * | bias, | ||
| int | M, | ||
| int | K | ||
| ) |
Definition at line 476 of file gemv_fused_quant_bias.c.
References dot_fp32_q8_0_block(), and QK8_0.
Referenced by gemv_fused_q8_0_bias_dispatch().
| void gemv_fused_q8_0_bias_dispatch | ( | float * | y, |
| const void * | W, | ||
| const float * | x, | ||
| const float * | bias, | ||
| int | M, | ||
| int | K | ||
| ) |
Definition at line 523 of file gemv_fused_quant_bias.c.
References gemv_fused_q8_0_bias().