喵哩个咪's starred repositories

LowMemoryBP

The official implementation of the paper "Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation"

Language:PythonLicense:MITStargazers:14Issues:0Issues:0

cron

a cron library for go

Language:GoLicense:MITStargazers:12905Issues:0Issues:0

Autofocus

Implementation of different autofocus functions using python. The main goal is to obtain efficiently the maximal contrast between pixels

Stargazers:15Issues:0Issues:0

CDAF

Contrast Detection Auto Focus

Language:PythonStargazers:2Issues:0Issues:0

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:1761Issues:0Issues:0

ao

PyTorch native quantization and sparsity for training and inference

Language:PythonLicense:BSD-3-ClauseStargazers:564Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:1117Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:4Issues:0Issues:0

AutoGGUF

automatically quant GGUF models

Language:PythonLicense:Apache-2.0Stargazers:109Issues:0Issues:0

ComfyUI-GGUF

GGUF Quantization support for native ComfyUI models

Language:PythonLicense:Apache-2.0Stargazers:540Issues:0Issues:0

jsonformer

A Bulletproof Way to Generate Structured JSON from Language Models

Language:Jupyter NotebookLicense:MITStargazers:4323Issues:0Issues:0

outlines

Structured Text Generation

Language:PythonLicense:Apache-2.0Stargazers:8046Issues:0Issues:0

ollama-copilot

Proxy that allows you to use ollama as a copilot like Github copilot

Language:GoStargazers:259Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:201Issues:0Issues:0

flash-attention-2-builds

(Unofficial) Manual builds of wheels for https://github.com/Dao-AILab/flash-attention for Windows x64

License:BSD-3-ClauseStargazers:11Issues:0Issues:0

SimpleTuner

A general fine-tuning kit geared toward diffusion models.

Language:PythonLicense:AGPL-3.0Stargazers:1394Issues:0Issues:0
Language:PythonStargazers:505Issues:0Issues:0
Stargazers:45Issues:0Issues:0
Language:PythonLicense:MITStargazers:533Issues:0Issues:0

SqueezeLLM

[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Language:PythonLicense:MITStargazers:624Issues:0Issues:0

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language:PythonStargazers:273Issues:0Issues:0

BlueLM

BlueLM(蓝心大模型): Open large language models developed by vivo AI Lab

Language:PythonLicense:NOASSERTIONStargazers:824Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:25515Issues:0Issues:0
Language:CudaLicense:MITStargazers:44Issues:0Issues:0

torch-bnb-fp4

Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops

Language:PythonLicense:MITStargazers:22Issues:0Issues:0

ppq

PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Language:PythonLicense:Apache-2.0Stargazers:1495Issues:0Issues:0

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonLicense:MITStargazers:285Issues:0Issues:0

optimum-quanto

A pytorch quantization backend for optimum

Language:PythonLicense:Apache-2.0Stargazers:735Issues:0Issues:0

ComfyUI-AutomaticCFG

If your image was a pizza and the CFG the temperature of your oven: this is a thermostat that ensures it is always cooked like you want. Also adds a 30% speed increase. For ComfyUI / StableDiffusion

Language:PythonStargazers:323Issues:0Issues:0

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Language:PythonLicense:Apache-2.0Stargazers:9485Issues:0Issues:0