bssrdf's repositories
UnderstandingUnixLinuxProgramming
source code for the book
avx2-examples
Short examples illustrating AVX2 intrinsics for simple tasks.
clip.cpp
CLIP inference in plain C/C++ with no extra dependencies
Cpp-Concurrency-in-Action-2ed
C++11/14/17/20 multithreading, involving operating system principles and concurrent programming technology.
cuda-1brc
My CUDA solution to the 1BRC
CUDA-Based-Image-Convolution
Developed and optimized a CUDA kernel for 2D convolution, accommodating a 2D input tensor and a 2D filter tensor, with transposed filter application.
CUDA_gemm
A simple high performance CUDA GEMM implementation.
CUDALibrarySamples
CUDA Library Samples
cutlass
CUDA Templates for Linear Algebra Subroutines
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
generative-ai-for-beginners
12 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
kgraph
A library for k-nearest neighbor search
LeNet-5_Speed_Up
Utilize OpenMP and CUDA to speed up LeNet-5 digit recognition CNN. In OpneMP, training with 11x speed up and 11x in testing. With the help of CUDA, the training is speed up by 3x and 57x speed up in testing.
moderngpu
Design patterns for GPU computing
PMPP
Solution of Programming Massively Parallel Processors
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
screenshot-to-code
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
sshfs
A network filesystem client to connect to SSH servers
stable-diffusion.cpp
Stable Diffusion in pure C/C++
stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
udlbook
Understanding Deep Learning - Simon J.D. Prince
x86-simd-sort
C++ template library for high performance SIMD based sorting algorithms