xzy's starred repositories

Neural-Networks-on-Silicon

This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.

Stargazers:1797Issues:0Issues:0

cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Language:CLicense:NOASSERTIONStargazers:5720Issues:0Issues:0

tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Language:CudaStargazers:81Issues:0Issues:0

hw

RTL, Cmodel, and testbench for NVDLA

Language:VerilogLicense:NOASSERTIONStargazers:1668Issues:0Issues:0

-deprecated-NVIDIA-GPU-Tensor-Core-Accelerator-PyTorch-OpenCV

Computer vision container that includes Jupyter notebooks with built-in code hinting, Anaconda, CUDA 11.8, TensorRT inference accelerator for Tensor cores, CuPy (GPU drop in replacement for Numpy), PyTorch, PyTorch geometric for Graph Neural Networks, TF2, Tensorboard, and OpenCV for accelerated workloads on NVIDIA Tensor cores and GPUs.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:158Issues:0Issues:0

TC-GNN_ATC23

Artifact for USENIX ATC'23: TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs.

Language:PythonStargazers:42Issues:0Issues:0

SwinBERT

Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"

Language:PythonLicense:MITStargazers:235Issues:0Issues:0

MAN-QSM

About Code release for "A MASKED ATTENTION NETWORK WITH QUERY SPARSITY MEASUREMENT FOR TIME SERIES ANOMALY DETECTION" (ICME 2023)"

Language:PythonStargazers:1Issues:0Issues:0
Language:PythonStargazers:1Issues:0Issues:0

SALO

An efficient spatial accelerator enabling hybrid sparse attention mechanisms for long sequences

Language:PythonStargazers:16Issues:0Issues:0

SparseAttention

Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"

Language:PythonLicense:MITStargazers:41Issues:0Issues:0

sparse-structured-attention

Sparse and structured neural attention mechanisms

Language:PythonLicense:BSD-3-ClauseStargazers:221Issues:0Issues:0

sea-attention

Official Implementation of SEA: Sparse Linear Attention with Estimated Attention Mask (ICLR 2024)

Language:PythonStargazers:5Issues:0Issues:0

flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Language:CudaLicense:Apache-2.0Stargazers:476Issues:0Issues:0

RTL_library_of_basic_hardware_units

Here are some implementations of basic hardware units in RTL language (verilog for now), which can be used for area/power evaluation and support the hardware design tradeoff.

Language:VerilogLicense:MITStargazers:9Issues:0Issues:0

longformer

Longformer: The Long-Document Transformer

Language:PythonLicense:Apache-2.0Stargazers:1994Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:11779Issues:0Issues:0

Sanger

A co-design architecture on sparse attention

Stargazers:35Issues:0Issues:0

EdgeBERT

HW/SW co-design of sentence-level energy optimizations for latency-aware multi-task NLP inference

Language:PythonLicense:NOASSERTIONStargazers:45Issues:0Issues:0

mistral-inference

Official inference library for Mistral models

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:9121Issues:0Issues:0

TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

Language:C++License:NOASSERTIONStargazers:1451Issues:0Issues:0

Butterfly_Acc

The codes and artifacts associated with our MICRO'22 paper titled: "Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design"

Language:VerilogStargazers:95Issues:0Issues:0

flash-llm

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Language:CudaLicense:Apache-2.0Stargazers:155Issues:0Issues:0
Language:C++License:MITStargazers:15Issues:0Issues:0

cuda-tutorial

A set of hands-on tutorials for CUDA programming

Language:CudaStargazers:166Issues:0Issues:0

ChatPaper

Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复

Language:PythonLicense:NOASSERTIONStargazers:17903Issues:0Issues:0

AlphaSparse

A intelligent matrix format designer for SpMV

Language:C++Stargazers:8Issues:0Issues:0

dgSPARSE-Lib

PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity

Language:CudaLicense:MITStargazers:94Issues:0Issues:0