nerfstudio-project / gsplat
CUDA accelerated rasterization of gaussian splatting
See what the GitHub community is most excited about today.
CUDA accelerated rasterization of gaussian splatting
Causal depthwise conv1d in CUDA, with a PyTorch interface
NCCL Tests
FlashInfer: Kernel Library for LLM Serving
LLM training in simple, raw C/CUDA
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Tile primitives for speedy kernels
CUDA Library Samples
CUDA Kernel Benchmarking Library
GPU accelerated decision optimization
RCCL Performance Benchmark Tests
Sample codes for my CUDA programming book
Fast CUDA matrix multiplication from scratch
Instant neural graphics primitives: lightning fast NeRF and more