Xumi's repositories
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
DIS
This is the repo for our new project Highly Accurate Dichotomous Image Segmentation
Eureka
Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models"
FasterTransformer
Transformer related optimization, including BERT, GPT
FGVC-PIM
Pytorch implementation for "A Novel Plug-in Module for Fine-Grained Visual Classification". fine-grained visual classification task.
gpt4all
gpt4all: an ecosystem of open-source chatbots trained on a massive collections of clean assistant data including code, stories and dialogue
how-to-optimize-gemm
RowMajor sgemm optimization
MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
tensorrtx
Implementation of popular deep learning networks with TensorRT network definition API
yolov7
Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors