yuanjiechen / trt2023

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Summary

最终成绩 237 ms

转为trt fp16 ~600ms

大量torch蚊子腿优化 ~470ms

减少step ~290ms

groupnorm plugin with fp16,shared memory,internel batch=2 ~237ms

未完成项目

nsys的使用

cudagraph

multi stream execution

尝试过失败项目

PTQ,QAT

flash attention plugin

merge two loops in one trt engine

About

License:Apache License 2.0


Languages

Language:Python 57.7%Language:C++ 33.1%Language:Cuda 7.7%Language:C 0.8%Language:CMake 0.6%Language:Shell 0.1%