pengwubj's repositories
.tmux
🇫🇷 Oh My Tmux! My pretty + versatile tmux configuration that just works (imho the best tmux configuration)
CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
CUDA-Learn-Notes
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
CUDAsmith
A CUDA compiler fuzzer
DeepLearningSystem
Deep Learning System core principles introduction.
e200_opensource
The Ultra-Low Power RISC Core
flash-attention
Fast and memory-efficient exact attention
Fractional-GPUs
Splits single Nvidia GPU into multiple partitions with complete compute and memory isolation (wrt to performace) between the partitions
gpu-benches
collection of benchmarks to measure basic GPU capabilities
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
leetcode
LeetCode Problems' Solutions
models
Pre-trained and Reproduced Deep Learning Models (经典复现模型)
NBAssembler
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
netron
Visualizer for deep learning and machine learning models
nsight-training
Training material for Nsight developer tools
one-key-hidpi
Enable macOS HiDPI and have a native setting.
open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
Project-Zipline
Defines a lossless compressed data format that is independent of CPU type, operating system, file system, and character set, and is suitable for compression using the XP10 algorithm.
riscv-profiles
RISC-V Architecture Profiles
riscv-soc-book
关于RISC-V你所需要知道的一切
spf13-vim
The ultimate vim distribution
tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
transformers-benchmarks
real Transformer TeraFLOPS on various GPUs