GEMM kernels with FP16 (half-precision) weights. More...
Go to the source code of this file.
Macros | |
| #define | fp16_to_fp32(x) ggml_fp16_to_fp32(x) |
| #define | fp32_to_fp16(x) ggml_fp32_to_fp16(x) |
Functions | |
| void | convert_f16_to_f32 (float *dst, const uint16_t *src, size_t count) |
| Convert FP16 tensor to FP32. More... | |
| void | convert_f32_to_f16 (uint16_t *dst, const float *src, size_t count) |
| Convert FP32 tensor to FP16. More... | |
| float | dot_f16 (const uint16_t *w_f16, const float *x, int K) |
| void | gemm_f16 (float *Y, const uint16_t *W, const float *X, int M, int N, int K) |
| Auto-dispatch GEMM based on available SIMD. More... | |
| void | gemm_f16_backward (float *dX, const uint16_t *W, const float *dY, int M, int N, int K) |
| Batched backward pass. More... | |
| void | gemm_f16_ref (float *Y, const uint16_t *W, const float *X, int M, int N, int K) |
| Matrix-matrix multiply with FP16 weights (scalar reference) More... | |
| void | gemv_f16 (float *y, const uint16_t *W, const float *x, int M, int K) |
| Auto-dispatch GEMV based on available SIMD. More... | |
| void | gemv_f16_backward (float *dX, const uint16_t *W, const float *dY, int M, int K) |
| Auto-dispatch backward. More... | |
| void | gemv_f16_backward_ref (float *dX, const uint16_t *W, const float *dY, int M, int K) |
| Backward pass: compute input gradient (scalar reference) More... | |
| void | gemv_f16_ref (float *y, const uint16_t *W, const float *x, int M, int K) |
| Matrix-vector multiply with FP16 weights (scalar reference) More... | |
GEMM kernels with FP16 (half-precision) weights.
After changes: make test && make llamacpp-parity-full
Implements matrix multiplication where:
Used for multimodal projection layers (mmproj-*.gguf files).
Definition in file gemm_kernels_f16.c.
| #define fp16_to_fp32 | ( | x | ) | ggml_fp16_to_fp32(x) |
Definition at line 36 of file gemm_kernels_f16.c.
| #define fp32_to_fp16 | ( | x | ) | ggml_fp32_to_fp16(x) |
Definition at line 37 of file gemm_kernels_f16.c.
| void convert_f16_to_f32 | ( | float * | dst, |
| const uint16_t * | src, | ||
| size_t | count | ||
| ) |
Convert FP16 tensor to FP32.
Definition at line 226 of file gemm_kernels_f16.c.
References fp16_to_fp32.
| void convert_f32_to_f16 | ( | uint16_t * | dst, |
| const float * | src, | ||
| size_t | count | ||
| ) |
Convert FP32 tensor to FP16.
Definition at line 250 of file gemm_kernels_f16.c.
References fp32_to_fp16.
| float dot_f16 | ( | const uint16_t * | w_f16, |
| const float * | x, | ||
| int | K | ||
| ) |
Definition at line 387 of file gemm_kernels_f16.c.
References gemv_f16().
| void gemm_f16 | ( | float * | Y, |
| const uint16_t * | W, | ||
| const float * | X, | ||
| int | M, | ||
| int | N, | ||
| int | K | ||
| ) |
Auto-dispatch GEMM based on available SIMD.
Definition at line 207 of file gemm_kernels_f16.c.
References gemm_f16_ref().
| void gemm_f16_backward | ( | float * | dX, |
| const uint16_t * | W, | ||
| const float * | dY, | ||
| int | M, | ||
| int | N, | ||
| int | K | ||
| ) |
Batched backward pass.
Definition at line 373 of file gemm_kernels_f16.c.
References gemv_f16_backward().
| void gemm_f16_ref | ( | float * | Y, |
| const uint16_t * | W, | ||
| const float * | X, | ||
| int | M, | ||
| int | N, | ||
| int | K | ||
| ) |
Matrix-matrix multiply with FP16 weights (scalar reference)
| Y | Output matrix [M x N] |
| W | Weight matrix in FP16 [M x K] |
| X | Input matrix [K x N] |
| M | Number of output rows |
| N | Batch size |
| K | Hidden dimension |
Definition at line 154 of file gemm_kernels_f16.c.
References gemv_f16_ref().
Referenced by gemm_f16().
| void gemv_f16 | ( | float * | y, |
| const uint16_t * | W, | ||
| const float * | x, | ||
| int | M, | ||
| int | K | ||
| ) |
Auto-dispatch GEMV based on available SIMD.
Definition at line 128 of file gemm_kernels_f16.c.
References gemv_f16_ref().
Referenced by dot_f16().
| void gemv_f16_backward | ( | float * | dX, |
| const uint16_t * | W, | ||
| const float * | dY, | ||
| int | M, | ||
| int | K | ||
| ) |
Auto-dispatch backward.
Definition at line 358 of file gemm_kernels_f16.c.
References gemv_f16_backward_ref().
Referenced by gemm_f16_backward().
| void gemv_f16_backward_ref | ( | float * | dX, |
| const uint16_t * | W, | ||
| const float * | dY, | ||
| int | M, | ||
| int | K | ||
| ) |
Backward pass: compute input gradient (scalar reference)
| dX | Output gradient w.r.t. input [K] |
| W | Weight matrix in FP16 format [M x K] |
| dY | Gradient w.r.t. output [M] |
| M | Number of output rows |
| K | Number of columns (input dimension) |
Definition at line 289 of file gemm_kernels_f16.c.
References fp16_to_fp32.
Referenced by gemv_f16_backward().
| void gemv_f16_ref | ( | float * | y, |
| const uint16_t * | W, | ||
| const float * | x, | ||
| int | M, | ||
| int | K | ||
| ) |
Matrix-vector multiply with FP16 weights (scalar reference)
| y | Output vector [M] |
| W | Weight matrix in FP16 [M x K] |
| x | Input vector [K] |
| M | Number of output rows |
| K | Number of columns |
Definition at line 62 of file gemm_kernels_f16.c.
References fp16_to_fp32.
Referenced by gemm_f16_ref(), and gemv_f16().