DrXuQian's starred repositories
CUDA-Learn-Notes
🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
buddy-mlir
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
tvm_mlir_learn
compiler learning resources collect.
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
nas-landmarkreg
[CVPR2021] Code for Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search
blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
generate-music
This repository belongs to the youtube video " Can AI make music?" (https://www.youtube.com/watch?v=aOsET8KapQQ) If you haven't seen it, please consider watching the video if you need a better understanding of the code.
Deep-Learning-for-Tracking-and-Detection
Collection of papers, datasets, code and other resources for object tracking and detection using deep learning
Single-Image-Super-Resolution
A collection of high-impact and state-of-the-art SR methods
modern-cpp-tutorial
📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/
deepsparse
Sparsity-aware deep learning inference runtime for CPUs
netflix-verify
流媒体NetFlix解锁检测脚本 / A script used to determine whether your network can watch native Netflix movies or not
sparse-winograd-cnn
Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)
ngraph-python
Original Python version of Intel® Nervana™ Graph
inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
tensorflow-internals
It is open source ebook about TensorFlow kernel and implementation mechanism.
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
once-for-all
[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment