engineer1109's repositories
LearnVulkan
Learn Vulkan. Advanced examples of Vulkan, QT, CUDA, OpenCV for Linux, Windows, Android.
LearnOpenGLES
C++ Tutorials and code samples of OpenGL ES. Support binding Qt. Support Linux, Android & Windows.
trt-llm-rag-linux-v0.9
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Linux using TensorRT-LLM
GCCVersionScript
Add Version Symbol to Share Library, such as funcA@ funcA@@CLASSA_1.0
trt-llm-rag-linux
Linux version with trt-llm-rag-windows for TensorRT-LLM V0.9 pre
ComputeLibrary
Compat for NVIDIA x86. Arm CNN ComputeLibrary
Paddle-Lite
Multi-platform high performance deep learning inference engine (飞桨多端多平台高性能深度学习推理引擎)
CLBlast
Tuned OpenCL BLAS
CUDA_gemm
A simple high performance CUDA GEMM implementation.
cutlass_flash_atten_fp8
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
Demo_MessageChat_Qt
Qt5气泡式聊天框——QListWidget+QPainter。 气泡式聊天的显示是由QListWidget作为控件,每个气泡是由QListWidgetItem提升成QWidget来实现的。每个气泡可以理解为可以自由布置里面内容的QWidget。每个Item保存聊天的对话、发送状态、时间、种类等。这个QWidget主要是显示一个头像+气泡,气泡里面是聊天的内容等。气泡是在paintEvent事件中,采用QPainter来绘制的。
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
ggml
Tensor library for machine learning
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
LeetCUDA
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥
LeetGPU
My solutions to leetGPU: https://leetgpu.com/challenges AND BEYOND
onnxruntime-extensions
The pre- and post- processing library for ONNX Runtime
OpenCLDNN
OpenCL DNN Library, such as conv2d, gemm, flash attention.
PaddleCustomDevice
PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
PaddleDetection
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.