AVX Q4_K x Q8_K matvec kernel for Sandy/Ivy Bridge. More...
Go to the source code of this file.
Functions | |
| void | gemv_q4_k_q8_k_avx (float *y, const void *W, const void *x_q8, int M, int K) |
| void | gemv_q4_k_q8_k_parallel (float *y, const void *W, const void *x_q8, int M, int K, int ith, int nth) |
| void | gemv_q4_k_q8_k_parallel_simd (float *y, const void *W, const void *x_q8, int M, int K, int ith, int nth) |
| void | gemv_q4_k_q8_k_ref (float *y, const void *W, const void *x_q8, int M, int K) |
AVX Q4_K x Q8_K matvec kernel for Sandy/Ivy Bridge.
After changes: make test && make llamacpp-parity-full
Uses _mm_maddubs_epi16 (SSSE3) for efficient u8*s8 multiply-add while maintaining our scale format from unpack_q4_k_scales.
Key improvement over SSE: _mm_maddubs_epi16 processes 16 pairs per instruction vs SSE's _mm_cvtepu8_epi16 + _mm_madd_epi16 (8 pairs).
Definition in file gemm_kernels_q4k_avx.c.
| void gemv_q4_k_q8_k_avx | ( | float * | y, |
| const void * | W, | ||
| const void * | x_q8, | ||
| int | M, | ||
| int | K | ||
| ) |
Definition at line 251 of file gemm_kernels_q4k_avx.c.
References gemv_q4_k_q8_k_ref().
Referenced by gemv_q4_k_q8_k(), and gemv_q4_k_q8_k_amx().
| void gemv_q4_k_q8_k_parallel | ( | float * | y, |
| const void * | W, | ||
| const void * | x_q8, | ||
| int | M, | ||
| int | K, | ||
| int | ith, | ||
| int | nth | ||
| ) |
Definition at line 206 of file gemm_kernels_q4k_q8k.c.
Referenced by gemv_q4_k_q8_k_parallel_simd().
| void gemv_q4_k_q8_k_parallel_simd | ( | float * | y, |
| const void * | W, | ||
| const void * | x_q8, | ||
| int | M, | ||
| int | K, | ||
| int | ith, | ||
| int | nth | ||
| ) |
Definition at line 263 of file gemm_kernels_q4k_avx.c.
References gemv_q4_k_q8_k_parallel().
Referenced by decode_layer_parallel(), mlp_parallel(), and qkv_projection_parallel().
| void gemv_q4_k_q8_k_ref | ( | float * | y, |
| const void * | W, | ||
| const void * | x_q8, | ||
| int | M, | ||
| int | K | ||
| ) |