iiLaurens's starred repositories
LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
haystack
:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
Liger-Kernel
Efficient Triton Kernels for LLM Training
transformer-explainer
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
promptbench
A unified evaluation framework for large language models
Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
MInference
[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
dom-to-semantic-markdown
DOM to Semantic-Markdown for use with LLMs
formatspread
Code accompanying "How I learned to start worrying about prompt formatting".