Col_In_Coding's repositories
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, llama.cpp (GGUF), Llama models.
diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
CUDALibrarySamples
CUDA Library Samples
CUDA-Learn-Notes
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
CV-CUDA
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Tensorrt-CV
Using TensorRT for Inference Model Deployment.
cccl
CUDA C++ Core Libraries
Wav2Lip
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020.
trt-samples-for-hackathon-cn
Simple samples for TensorRT programming
bitsandbytes
8-bit CUDA functions for PyTorch
triton
Development repository for the Triton language and compiler
TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
llama
Inference code for LLaMA models
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
stable-diffusion-tritonserver
Deploy stable diffusion model with onnx/tenorrt + tritonserver
cub
Cooperative primitives for CUDA C++.
apollo
An open autonomous driving platform
tensorrt_plugin_generator
A simple tool that can generate TensorRT plugin code quickly.
tensorRT_Pro
C++ library based on tensorrt integration
taming-transformers
Taming Transformers for High-Resolution Image Synthesis
latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
learning-cuda-trt
A large number of cuda/tensorrt cases . 大量案例来学习cuda/tensorrt
CPP-Training
Deep Dive in C++, Bazel