DrXuQian's starred repositories

CUDA-Learn-Notes

🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaLicense:GPL-3.0Stargazers:928Issues:0Issues:0

buddy-mlir

An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).

Language:C++License:Apache-2.0Stargazers:451Issues:0Issues:0

tvm_mlir_learn

compiler learning resources collect.

Language:PythonStargazers:1970Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:2084Issues:0Issues:0

Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Language:PythonStargazers:999Issues:0Issues:0

onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

Language:C++License:Apache-2.0Stargazers:720Issues:0Issues:0

nas-landmarkreg

[CVPR2021] Code for Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search

Language:PythonLicense:MITStargazers:9Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:34057Issues:0Issues:0

blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Language:CudaLicense:MITStargazers:1015Issues:0Issues:0

generate-music

This repository belongs to the youtube video " Can AI make music?" (https://www.youtube.com/watch?v=aOsET8KapQQ) If you haven't seen it, please consider watching the video if you need a better understanding of the code.

Language:PythonLicense:MITStargazers:153Issues:0Issues:0
Language:C++Stargazers:3Issues:0Issues:0

Deep-Learning-for-Tracking-and-Detection

Collection of papers, datasets, code and other resources for object tracking and detection using deep learning

Language:HTMLStargazers:2402Issues:0Issues:0

Single-Image-Super-Resolution

A collection of high-impact and state-of-the-art SR methods

Stargazers:1833Issues:0Issues:0

xbyak

a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header

Language:C++License:BSD-3-ClauseStargazers:2001Issues:0Issues:0

modern-cpp-tutorial

📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/

Language:C++License:MITStargazers:23570Issues:0Issues:0

deepsparse

Sparsity-aware deep learning inference runtime for CPUs

Language:PythonLicense:NOASSERTIONStargazers:2947Issues:0Issues:0

netflix-verify

流媒体NetFlix解锁检测脚本 / A script used to determine whether your network can watch native Netflix movies or not

Language:GoLicense:GPL-3.0Stargazers:2442Issues:0Issues:0

sparse-winograd-cnn

Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)

Language:PythonLicense:MITStargazers:188Issues:0Issues:0

ngraph-python

Original Python version of Intel® Nervana™ Graph

Language:PythonLicense:Apache-2.0Stargazers:215Issues:0Issues:0

inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

Language:C++License:MITStargazers:189Issues:0Issues:0

sgemm_hsw

This is an implementation of sgemm_kernel on L1d cache.

Language:AssemblyLicense:GPL-3.0Stargazers:213Issues:0Issues:0

qnnpack

Explained QNNPACK Implementation

Language:CLicense:NOASSERTIONStargazers:20Issues:0Issues:0

oneDNN

oneAPI Deep Neural Network Library (oneDNN)

Language:C++License:Apache-2.0Stargazers:3541Issues:0Issues:0

tensorflow-internals

It is open source ebook about TensorFlow kernel and implementation mechanism.

Language:TeXStargazers:2889Issues:0Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:129907Issues:0Issues:0

once-for-all

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment

Language:PythonLicense:MITStargazers:1855Issues:0Issues:0

BRECQ

Pytorch implementation of BRECQ, ICLR 2021

Language:PythonLicense:MITStargazers:242Issues:0Issues:0

PAMS

PArameterized Max Scale

Language:PythonStargazers:57Issues:0Issues:0

nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

Language:PythonLicense:MITStargazers:13912Issues:0Issues:0