Xiaowei Mao's starred repositories
Stirling-PDF
#1 Locally hosted web application that allows you to perform various operations on PDF files
focalboard
Focalboard is an open source, self-hosted alternative to Trello, Notion, and Asana.
etherpad-lite
Etherpad: A modern really-real-time collaborative document editor.
Scrapegraph-ai
Python scraper based on AI
DeepSpeedExamples
Example models using DeepSpeed
alignment-handbook
Robust recipes to align language models with human and AI preferences
llama3-Chinese-chat
Llama3、Llama3.1 中文仓库(聚合资料,各种网友及厂商微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档)
Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
mimic-code
MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases
InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
data-centric-AI
A curated, but incomplete, list of data-centric AI resources.
data_management_LLM
Collection of training data management explorations for large language models
NeuScraper
[ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".
PMC-Patients
PMC-Patients
RoBERTa_Encoder_Decoder_Product_Names
Define Transformers, T5 model and RoBERTa Encoder decoder model for product names generation
BERT-from-Scratch-with-PyTorch
Implementation of BERT-based Language Models