Col_In_Coding's starred repositories
text-generation-webui
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
chinese-independent-developer
👩🏿💻👨🏾💻👩🏼💻👨🏽💻👩🏻💻**独立开发者项目列表 -- 分享大家都在做什么
HowToLiveLonger
程序员延寿指南 | A programmer's guide to live longer
Kalman-and-Bayesian-Filters-in-Python
Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.
flash-attention
Fast and memory-efficient exact attention
llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
statistical-learning-method-solutions-manual
统计学习方法习题解答,在线阅读地址:https://datawhalechina.github.io/statistical-learning-method-solutions-manual
onnx-modifier
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
CUDA-Learn-Notes
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
photometric_optimization
Photometric optimization code for creating the FLAME texture space and other applications
llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
aisys-building-blocks
Building blocks for foundation models.
TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
Facial-Landmarks-Annotation-Tool
A visual editor for manually annotating facial landmarks in images of human faces.
facial-landmark-dataset
A collection of facial landmark datasets and Python code to make use of them.