Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core). SHA Extensions give a performance boost of close to 4x over native.
Implementation of 2D Convolution operation for Neural Networks using Intel x86(i368)/x86-6(amd64) AVX-256 instructions. All data flow methods, i.e input stationary, weight stationary and output stationary are implemented. The forward pass of Alexnet architecture is constructed using it.