Beast code in Giters

xzy's starred repositories

Neural-Networks-on-Silicon

This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.

179700

INT8-Flash-Attention-FMHA-Quantization

Language:Cuda14500

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Language:CNOASSERTION572000

tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Language:Cuda8100

hw

RTL, Cmodel, and testbench for NVDLA

Language:VerilogNOASSERTION166800

-deprecated-NVIDIA-GPU-Tensor-Core-Accelerator-PyTorch-OpenCV

Computer vision container that includes Jupyter notebooks with built-in code hinting, Anaconda, CUDA 11.8, TensorRT inference accelerator for Tensor cores, CuPy (GPU drop in replacement for Numpy), PyTorch, PyTorch geometric for Graph Neural Networks, TF2, Tensorboard, and OpenCV for accelerated workloads on NVIDIA Tensor cores and GPUs.

Language:Jupyter NotebookApache-2.015800

TC-GNN_ATC23

Artifact for USENIX ATC'23: TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.

Language:Python4200

SwinBERT

Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"

Language:PythonMIT23500

MAN-QSM

About Code release for "A MASKED ATTENTION NETWORK WITH QUERY SPARSITY MEASUREMENT FOR TIME SERIES ANOMALY DETECTION" (ICME 2023)"

Language:Python100

Sparse_Attention_on_Transformer-based_model

Language:Python200

ViTs-with-Sparse-Attention

Language:Python100

SALO

An efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences

Language:Python1600

SparseAttention

Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"

Language:PythonMIT4100

sparse-structured-attention

Sparse and structured neural attention mechanisms

Language:PythonBSD-3-Clause22100

sea-attention

Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)

Language:Python500

flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Language:CudaApache-2.047600

RTL_library_of_basic_hardware_units

Here are some implementations of basic hardware units in RTL language (verilog for now), which can be used for area/power evaluation and support the hardware design tradeoff.

Language:VerilogMIT900

longformer

Longformer: The Long-Document Transformer

Language:PythonApache-2.0199400

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause1177900

Sanger

A co-design architecture on sparse attention

3500

EdgeBERT

HW/SW co-design of sentence-level energy optimizations for latency-aware multi-task NLP inference

Language:PythonNOASSERTION4500

mistral-inference

Official inference library for Mistral models

Language:Jupyter NotebookApache-2.0912100

TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

Language:C++NOASSERTION145100

Butterfly_Acc

The codes and artifacts associated with our MICRO'22 paper titled: "Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design"

Language:Verilog9500