Uranus's starred repositories
ChatGPT-Next-Web
A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。
lobe-chat
🤯 Lobe Chat - an open-source, modern-design LLMs/AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Bedrock / Azure / Mistral / Perplexity ), Multi-Modals (Vision/TTS) and plugin system. One-click FREE deployment of your private ChatGPT chat application.
chatbot-ui
AI chat for every model.
LibreChat
Enhanced ChatGPT Clone: Features OpenAI, Assistants API, Azure, Groq, GPT-4 Vision, Mistral, Bing, Anthropic, OpenRouter, Vertex AI, Gemini, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. More features in development
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
whisper_streaming
Whisper realtime streaming for long speech-to-text transcription and translation
Awesome-Efficient-LLM
A curated list for Efficient Large Language Models
ring-flash-attention
Ring attention implementation with flash attention
ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
long-context-attention
Sequence Parallel Attention for Long Context LLM Model Training and Inference
scattermoe
Triton-based implementation of Sparse Mixture of Experts.
libflash_attn
Standalone Flash Attention v2 kernel without libtorch dependency