liupeng's starred repositories
lobe-chat
🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Azure / DeepSeek), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Vision/TTS) and plugin system. One-click FREE deployment of your private ChatGPT/ Claude application.
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
elasticsearch-analysis-ik
The IK Analysis plugin integrates Lucene IK analyzer into elasticsearch, support customized dictionary.
seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
lm-evaluation-harness
A framework for few-shot evaluation of language models.
pycorrector
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,LLaMA等模型应用在纠错场景,开箱即用。
chatgpt-web-share
ChatGPT Plus 共享方案。ChatGPT Plus / OpenAI API sharing solution.
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words
List of Dirty, Naughty, Obscene, and Otherwise Bad Words
LLMDataHub
A quick guide (especially) for trending instruction finetuning datasets
Chinese-Llama-2-7b
开源社区第一个能下载、能运行的中文 LLaMA2 模型!
LLMTest_NeedleInAHaystack
Doing simple retrieval from LLM models at various context lengths to measure accuracy
LeanCopilot
LLMs as Copilots for Theorem Proving in Lean
Llama2-Code-Interpreter
Make Llama2 use Code Execution, Debug, Save Code, Reuse it, Access to Internet
LifeReloaded
A life simulation Game powered by GPT-4's “Advanced Data Analysis” function , offering you a second chance at life. 由GPT4的Advanced Data Analysis功能驱动的人生重来模拟器,给您人生第二春。
Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
data_management_LLM
Collection of training data management explorations for large language models
Open-Instruction-Generalist
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
wikitextprocessor
Python package for WikiMedia dump processing (Wiktionary, Wikipedia etc). Wikitext parsing, template expansion, Lua module execution. For data extraction, bulk syntax checking, error detection, and offline formatting.