Antti Puurula's starred repositories
LocalAI
:robot: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
text-generation-inference
Large Language Model Text Generation Inference
CTranslate2
Fast inference engine for Transformer models
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
langstream
Build robust LLM applications with true composability 🔗
speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
hf-hub-ctranslate2
Connecting Transformers on HuggingFace Hub with CTranslate2
Pytorch_Merge
Merge LLM that are split in to parts
text-generation-inference
Large Language Model Text Generation Inference
open-text-generation-inference
Open Large Language Model Text Generation Inference - will remain Apache-2.0