xzy's starred repositories
Neural-Networks-on-Silicon
This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
-deprecated-NVIDIA-GPU-Tensor-Core-Accelerator-PyTorch-OpenCV
Computer vision container that includes Jupyter notebooks with built-in code hinting, Anaconda, CUDA 11.8, TensorRT inference accelerator for Tensor cores, CuPy (GPU drop in replacement for Numpy), PyTorch, PyTorch geometric for Graph Neural Networks, TF2, Tensorboard, and OpenCV for accelerated workloads on NVIDIA Tensor cores and GPUs.
TC-GNN_ATC23
Artifact for USENIX ATC'23: TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.
SparseAttention
Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"
sparse-structured-attention
Sparse and structured neural attention mechanisms
sea-attention
Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)
flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
RTL_library_of_basic_hardware_units
Here are some implementations of basic hardware units in RTL language (verilog for now), which can be used for area/power evaluation and support the hardware design tradeoff.
longformer
Longformer: The Long-Document Transformer
flash-attention
Fast and memory-efficient exact attention
mistral-inference
Official inference library for Mistral models
TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Butterfly_Acc
The codes and artifacts associated with our MICRO'22 paper titled: "Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design"
cuda-tutorial
A set of hands-on tutorials for CUDA programming
AlphaSparse
A intelligent matrix format designer for SpMV
dgSPARSE-Lib
PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity