coder(anonymous)'s repositories
awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
FeatherCNN
FeatherCNN is a high performance inference engine for convolutional neural networks.
gluon-cv
Gluon CV Toolkit
hipBLAS
ROCm BLAS marshalling library
HowToCook
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Chinese only).
incubator-mxnet
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
models
A collection of pre-trained, state-of-the-art models in the ONNX format
oneDNN
oneAPI Deep Neural Network Library (oneDNN)
Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually.
rankfm
Factorization Machines for Recommendation and Ranking Problems with Implicit Feedback Data
TileSpGEMM
Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, and Weifeng Liu.
tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA