Beast code in Giters

DrXuQian's starred repositories

CUDA-Learn-Notes

🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaGPL-3.092800

buddy-mlir

An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).

Language:C++Apache-2.045100

tvm_mlir_learn

compiler learning resources collect.

Language:Python197000

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0208400

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Language:Python99900

onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

Language:C++Apache-2.072000

nas-landmarkreg

[CVPR2021] Code for Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search

Language:PythonMIT900

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.03405700

blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Language:CudaMIT101500

This repository belongs to the youtube video " Can AI make music?" (https://www.youtube.com/watch?v=aOsET8KapQQ) If you haven't seen it, please consider watching the video if you need a better understanding of the code.

Language:PythonMIT15300

cplusplus-learn

Language:C++300

Deep-Learning-for-Tracking-and-Detection

Collection of papers, datasets, code and other resources for object tracking and detection using deep learning

Language:HTML240200

Single-Image-Super-Resolution

A collection of high-impact and state-of-the-art SR methods

183300

xbyak

a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header

Language:C++BSD-3-Clause200100

modern-cpp-tutorial

📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/

Language:C++MIT2357000

deepsparse

Sparsity-aware deep learning inference runtime for CPUs

Language:PythonNOASSERTION294700

netflix-verify

流媒体NetFlix解锁检测脚本 / A script used to determine whether your network can watch native Netflix movies or not

Language:GoGPL-3.0244200

sparse-winograd-cnn

Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)

Language:PythonMIT18800

ngraph-python

Original Python version of Intel® Nervana™ Graph

Language:PythonApache-2.021500

how-to-optimize-gemm

100

inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

Language:C++MIT18900

sgemm_hsw

This is an implementation of sgemm_kernel on L1d cache.

Language:AssemblyGPL-3.021300

qnnpack

Explained QNNPACK Implementation

Language:CNOASSERTION2000

oneDNN

oneAPI Deep Neural Network Library (oneDNN)

Language:C++Apache-2.0354100

tensorflow-internals

It is open source ebook about TensorFlow kernel and implementation mechanism.

Language:TeX288900

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.012990700

once-for-all

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment

Language:PythonMIT185500

BRECQ

Pytorch implementation of BRECQ, ICLR 2021

Language:PythonMIT24200

PAMS

PArameterized Max Scale

Language:Python5700

nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Language:PythonMIT1391200