col-in-coding

Col_In_Coding's repositories

text-generation-webui

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, llama.cpp (GGUF), Llama models.

Language:PythonAGPL-3.0000

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

Language:PythonApache-2.0000

CUDA-Learn-Notes

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

GPL-3.0000

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

NOASSERTION000

CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

Language:C++Apache-2.0000

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0000

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Apache-2.0000

Tensorrt-CV

Using TensorRT for Inference Model Deployment.

Language:CudaApache-2.04400

cccl

CUDA C++ Core Libraries

NOASSERTION000

Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020.

000

trt-samples-for-hackathon-cn

Simple samples for TensorRT programming

Language:PythonApache-2.0000

bitsandbytes

8-bit CUDA functions for PyTorch

MIT000

TRT-Hackathon-2023-Final

Language:C++Apache-2.0000

triton

Development repository for the Triton language and compiler

Language:C++MIT000

TensorRT

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

Language:C++Apache-2.0000

llama

Inference code for LLaMA models

NOASSERTION000

AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Language:PythonApache-2.0000