Björn Plüster's repositories
llama_gradio_interface
Inference code for LLaMA models with Gradio Interface and rolling generation like ChatGPT
GermanBenchmark
A repository containing the code for translating popular LLM benchmarks to German.
lm-evaluation-harness-de
A framework for few-shot evaluation of autoregressive language models.
cerebras-lora
Instruct-tune Cerebras-GPT on consumer hardware
prismer_gradio_demo
The implementation of "Prismer: A Vision-Language Model with An Ensemble of Experts".
axolotl
Go ahead and axolotl questions
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
distilabel
Distilabel is a framework for synthetic data and AI feedback for AI engineers that require high-quality outputs, full data ownership, and overall efficiency
epfl-megatron
distributed trainer for LLMs
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
github-deploy-notifications-hider
Chrome extension to hide deployment notifications in GitHub pull requests
inspect_ai
Inspect: A framework for large language model evaluations
llama-pipeline-parallel
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you have encoured.
llama_index
LlamaIndex is a data framework for your LLM applications
LMFlow
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Model for All.
NeedleInAHaystack_DE
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
qlora_oasst
QLoRA: Efficient Finetuning of Quantized LLMs
text-dedup-oscar2023
All-in-one text de-duplication
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs