Tomas Lyu's starred repositories
LLMs-from-scratch
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
llms-from-scratch-cn
仅需Python基础,从0构建大语言模型;从0逐步构建GLM4\Llama3\RWKV6, 深入理解大模型原理
buffer-of-thought-llm
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
awesome-LLM-resourses
🧑🚀 全世界最好的中文LLM资料总结
llama-models
Utilities intended for use with Llama models.
llama-recipes
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
FasterTransformer
Transformer related optimization, including BERT, GPT
text-generation-inference
Large Language Model Text Generation Inference
tinysearch
🔍 Tiny, full-text search engine for static websites built with Rust and Wasm
fastembed-rs
Library for generating vector embeddings, reranking in Rust
rag-api-server
A RAG API server written in Rust following OpenAI specs
Streamer-Sales
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️
text-embeddings-inference
A blazing fast inference solution for text embeddings models
mistral.rs
Blazingly fast LLM inference.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.