This is a series of GPU optimization topics. Here we will introduce how to optimize the program on the GPU in detail. The reduce optimization has been completed. The optimization of GEMM has completed the CUDA C code. The assembler is currently being used to tune the code, and the code will be issued later.
A PyTorch Library for Multi-Task Learning
pycorrector is a toolkit for text error correction. It was developed to facilitate the designing, comparing, and sharing of deep text error correction models.
Implement popular deep learning networks in pytorch, used by tensorrtx.
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Fast and accurate object detection with end-to-end GPU optimization
TensorRT-7 Network Lib 包括常用目标检测、关键点检测、人脸检测、OCR等 可训练自己数据
Implementation of popular deep learning networks with TensorRT network definition API
Generate text images for training deep learning ocr model
An easy to use PyTorch to TensorRT converter
📚 single header utf8 string functions for C and C++
The first competitive instance segmentation approach that runs on small edge devices at real-time speeds.
A c++ implementation of yolov5 and deepsort