shamio's starred repositories

llama.cpp

LLM inference in C/C++

text-generation-webui

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

Language:PythonLicense:AGPL-3.0Stargazers:37620Issues:325Issues:3463

unsloth

Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:11321Issues:80Issues:470

MemGPT

Create LLM agents with long-term memory and custom tools 📚🦙

Language:PythonLicense:Apache-2.0Stargazers:10497Issues:109Issues:605

PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

Language:C++License:MITStargazers:7076Issues:75Issues:135

streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Language:PythonLicense:MITStargazers:6303Issues:61Issues:76

koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

Language:C++License:AGPL-3.0Stargazers:4183Issues:61Issues:609

OLMo

Modeling, training, eval, and inference code for OLMo

Language:PythonLicense:Apache-2.0Stargazers:4117Issues:41Issues:157

lollms-webui

Lord of Large Language Models Web User Interface

Language:VueLicense:Apache-2.0Stargazers:3990Issues:59Issues:270

mergekit

Tools for merging pretrained large language models.

Language:PythonLicense:LGPL-3.0Stargazers:3827Issues:47Issues:237

intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Language:PythonLicense:Apache-2.0Stargazers:2001Issues:27Issues:143

languagemodels

Explore large language models in 512MB of RAM

Language:PythonLicense:MITStargazers:1160Issues:10Issues:17

Long-Context

This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchmark tasks that evaluate a model’s information retrieval capabilities with context expansion. We also include key experimental results and instructions for reproducing and building on them.

Language:PythonLicense:Apache-2.0Stargazers:564Issues:12Issues:6

exui

Web UI for ExLlamaV2

Language:JavaScriptLicense:MITStargazers:374Issues:8Issues:41

neural-speed

An innovative library for efficient LLM inference via low-bit quantization

Language:C++License:Apache-2.0Stargazers:287Issues:8Issues:41

qmoe

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Language:PythonLicense:Apache-2.0Stargazers:253Issues:6Issues:4

LongAlign

LongAlign: A Recipe for Long Context Alignment Encompassing Data, Training, and Evaluation

Language:PythonLicense:Apache-2.0Stargazers:121Issues:8Issues:8

Entropy-ABF

Official implementation for 'Extending LLMs’ Context Window with 100 Samples'