gemm

Star

Here are 26 public repositories matching this topic...

flame / how-to-optimize-gemm

Star

matrix-multiplication gemm code-optimization gotoblas blis

Updated Jun 18, 2018
C

CNugteren / CLBlast

Sponsor Star

Tuned OpenCL BLAS

gpu opencl matrix-multiplication blas gemm blas-libraries clblas

Updated Oct 10, 2020
C++

numforge / laser

Star

The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers

deep-learning assembler parallel openmp jit simd matrix-multiplication high-performance-computing blas convolution tensor compiler-optimization gemm runtime-cpu-detection

Updated Nov 9, 2019
Nim

flame / blislab

Star

BLISlab: A Sandbox for Optimizing GEMM

matrix-multiplication gemm code-optimization blis

Updated Aug 6, 2019
C

OpenNMT / CTranslate2

Star

Open

Prefix build options with "CT2_"

guillaumekln commented Jul 29, 2020

We should prefix CMake build options with "CT2_", e.g. CT2_WITH_MKL instead of WITH_MKL. This is a good practice to avoid possible conflicts with other projects.

The usage should then be updated in several places:

README.md
docker/Dockerfile.*
python/tools/build_wheel.sh
.travis.yml
src/cpu/cpu_isa.h
src/profiler.cc

build enhancement good first issue

ROCmSoftwarePlatform / Tensile

Star

Stretching GPU performance for GEMMs and tensor contractions.

python machine-learning amd gpu assembly opencl dnn matrix-multiplication neural-networks gpu-acceleration blas hip gpu-computing tensors tensor-contraction gemm radeon auto-tuning radeon-open-compute

Updated Dec 21, 2020
Python

yui0 / slibs

Star

Single file libraries for C/C++

audio c mp4 opencl mp3 glsl aac mpeg gpgpu flac blas m4a gemm single-header-lib

Updated Dec 21, 2020
C

cp2k / dbcsr

Star

DBCSR: Distributed Block Compressed Sparse Row matrix library

hpc mpi cuda matrix-multiplication blas sparse-matrix cp2k gemm mkl openmp-parallelization

Updated Dec 21, 2020
Fortran

hma02 / cublasHgemm-P100

Star

Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm

gpu cublas precision gemm half-precision float16 p100 v100

Updated Aug 20, 2019
Cuda

szagoruyko / openai-gemm.pytorch

Star

PyTorch bindings for openai-gemm

pytorch gemm

Updated Feb 6, 2017
Python

hma02 / cublasgemm-benchmark

Star

code for benchmarking GPU performance based on cublasSgemm and cublasHgemm

benchmarking gpu cuda cublas gemm gpu-performance

Updated Jul 7, 2017
Cuda

mz24cn / gemm_optimization

Star

The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能，提供binary，开盒即用。