Tat Trong Vu's repositories
alignment-handbook
Robust recipes to align language models with human and AI preferences
alpaca-lora
Instruct-tune LLaMA on consumer hardware
awesome-instruction-dataset
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
FlagEmbedding
Dense Retrieval and Retrieval-augmented LLMs
infinity
Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of sentence-transformer models and frameworks.
llama-cpp-python
Python bindings for llama.cpp
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.
LLM_Helper_Scripts
Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub
LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
lm-evaluation-harness
A framework for few-shot evaluation of language models.
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
LongLoRA
Code and documents of LongLoRA and LongAlpaca
mergekit
Tools for merging pretrained large language models.
openai-token-counter
Count tokens for OpenAI accurately with support for all parameters like name, functions.
outlines-regex
Guided Text Generation
petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
pytorch_mppi
Model Predictive Path Integral (MPPI) with approximate dynamics implemented in pytorch
pytransform3d
3D transformations for Python.
ray-llm
RayLLM - LLMs on Ray
S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
sentence-transformers
Multilingual Sentence & Image Embeddings with BERT
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
text-generation-inference
Large Language Model Text Generation Inference
VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
vllm-gptq
A high-throughput and memory-efficient inference and serving engine for LLMs
yarn
YaRN: Efficient Context Window Extension of Large Language Models