haichaozhang's repositories
alpa_zhc
Auto parallelization for large-scale neural networks
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Cream
This is a collection of our NAS and Vision Transformer work.
cub
Cooperative primitives for CUDA C++.
FasterTransformer
Transformer related optimization, including BERT, GPT
glslang
Khronos-reference front end for GLSL/ESSL, partial front end for HLSL, and a SPIR-V generator.
Hetu
A high-performance distributed deep learning system targeting large-scale and automated distributed training.
mmdeploy
OpenMMLab Model Deployment Framework
multibuild
Machinery for building and testing Python Wheels for Linux, OSX and (less flexibly) Windows.
nccl
Optimized primitives for collective multi-GPU communication
ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
opencv
Open Source Computer Vision Library
opencv-python
Automated CI toolchain to produce precompiled opencv-python, opencv-python-headless, opencv-contrib-python and opencv-contrib-python-headless packages.
opencv_contrib
Repository for OpenCV's extra modules
opencv_extra
OpenCV extra data
optimum-quanto
A pytorch quantization backend for optimum
ppl.cv
ppl.cv is a high-performance image processing library of openPPL supporting various platforms.
ppl.nn
A primitive library for neural network
pybind11
Seamless operability between C++11 and Python
qt5
Qt5 super module
qtbase
Qt Base (Core, Gui, Widgets, Network, ...)
spdlog
Fast C++ logging library.
Swin-Transformer
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
traitlets
A lightweight Traits like module
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators