col-in-coding

Col_In_Coding's starred repositories

llama

Inference code for Llama models

Language:PythonNOASSERTION53856 511 924

text-generation-webui

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

Language:PythonAGPL-3.037645 325 3463

chinese-independent-developer

👩🏿‍💻👨🏾‍💻👩🏼‍💻👨🏽‍💻👩🏻‍💻**独立开发者项目列表 -- 分享大家都在做什么

35671 1219 118

HowToLiveLonger

程序员延寿指南 | A programmer's guide to live longer

Unlicense29339 230 128

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.017389 159 277

Kalman-and-Bayesian-Filters-in-Python

Kalman Filter book using Jupyter Notebook. Focuses on building intuition and experience, not formal proofs. Includes Kalman filters,extended Kalman filters, unscented Kalman filters, particle filters, and more. All exercises include solutions.

Language:Jupyter NotebookNOASSERTION16026 473 314

triton

Development repository for the Triton language and compiler

Language:C++MIT11635 182 1240

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause11523 106 827

llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.

Language:Jupyter NotebookNOASSERTION10095 82 283

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.07096 82 1441

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Language:PythonMIT6305 61 76

bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.

Language:PythonMIT5641 47 954

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++NOASSERTION4739 107 898

CUDA_Freshman

Language:Cuda1946 10 13

statistical-learning-method-solutions-manual

统计学习方法习题解答，在线阅读地址：https://datawhalechina.github.io/statistical-learning-method-solutions-manual

Language:Jupyter NotebookNOASSERTION1631 24 21

tomesd

Speed up Stable Diffusion with this one simple trick!

Language:PythonMIT1227 19 47

onnx-modifier

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

Language:JavaScriptMIT1165 10 94

cccl

CUDA C++ Core Libraries

Language:C++NOASSERTION884 30 1039

deq

[NeurIPS'19] Deep Equilibrium Models

Language:PythonMIT707 21 29

CUDA-Learn-Notes

🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Language:CudaGPL-3.0665 8 5

photometric_optimization

Photometric optimization code for creating the FLAME texture space and other applications

Language:PythonMIT496 9 21

llama.onnx

LLaMa/RWKV onnx models, quantization and testcase

Language:PythonGPL-3.0331 13 18

aisys-building-blocks

Building blocks for foundation models.

259 270

TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.

Language:PythonNOASSERTION250 8 24