sqhao's repositories
onnx-tensorrt
ONNX-TensorRT: TensorRT backend for ONNX
openvino
OpenVINO™ Toolkit repository
openvino2tensorflow
This script converts the OpenVINO IR model to Tensorflow's saved_model, tflite, h5, tfjs, tftrt(TensorRT), CoreML, EdgeTPU, ONNX and pb. PyTorch (NCHW) -> ONNX (NCHW) -> OpenVINO (NCHW) -> openvino2tensorflow -> Tensorflow/Keras (NHWC) -> TFLite (NHWC). And the conversion from .pb to saved_model and from saved_model to .pb and from .pb to .tflite and saved_model to .tflite and saved_model to onnx. Support for building environments with Docker. It is possible to directly access the host PC GUI and the camera to verify the operation. NVIDIA GPU (dGPU) support. Intel iHD GPU (iGPU) support.
3d-photo-inpainting
[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting
backend
Common source, scripts and utilities for creating Triton backends.
cnpy
library to read/write .npy and .npz files in C/C++
flashinfer
FlashInfer: Kernel Library for LLM Serving
gemma.cpp
lightweight, standalone C++ inference engine for Google's Gemma models.
highway
Performance-portable, length-agnostic SIMD with runtime dispatch
onnx
Open standard for machine learning interoperability
ONNX-Python-Examples
ONNX Python Examples
onnx-simplifier
Simplify your onnx model
optimizer
Actively maintained ONNX Optimizer
perf-ninja
This is an online course where you can learn and master the skill of low-level performance analysis and tuning.
pillow-resize
Porting of Pillow resize method in C++ and OpenCV.
prajna
a program language for AI infrastructure
pybind11
Seamless operability between C++11 and Python
rocketmq-client-cpp
Apache RocketMQ cpp client
stable-diffusion.cpp
Stable Diffusion in pure C/C++
stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
tensorRT_Pro
C++ library based on tensorrt integration
treelite
model compiler for decision tree ensembles
trt-samples-for-hackathon-cn
Simple samples for TensorRT programming
TRTorch
PyTorch/TorchScript compiler for NVIDIA GPUs using TensorRT
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
whisper
Robust Speech Recognition via Large-Scale Weak Supervision