Xiaowei Mao's starred repositories

Stirling-PDF

#1 Locally hosted web application that allows you to perform various operations on PDF files

Language:JavaLicense:GPL-3.0Stargazers:35084Issues:130Issues:683

outline

The fastest knowledge base for growing teams. Beautiful, realtime collaborative, feature packed, and markdown compatible.

Language:TypeScriptLicense:NOASSERTIONStargazers:26505Issues:160Issues:2737

Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

Language:PythonLicense:MITStargazers:23655Issues:136Issues:508

focalboard

Focalboard is an open source, self-hosted alternative to Trello, Notion, and Asana.

Language:TypeScriptLicense:NOASSERTIONStargazers:20909Issues:146Issues:2390

etherpad-lite

Etherpad: A modern really-real-time collaborative document editor.

Language:JavaScriptLicense:Apache-2.0Stargazers:16307Issues:354Issues:3117

marker

Convert PDF to markdown quickly with high accuracy

Language:PythonLicense:GPL-3.0Stargazers:14914Issues:60Issues:179

Scrapegraph-ai

Python scraper based on AI

Language:PythonLicense:MITStargazers:12888Issues:86Issues:164

DeepSpeedExamples

Example models using DeepSpeed

Language:PythonLicense:Apache-2.0Stargazers:5892Issues:76Issues:529

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonLicense:Apache-2.0Stargazers:4284Issues:109Issues:125

llama3-Chinese-chat

Llama3、Llama3.1 中文仓库(聚合资料,各种网友及厂商微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档)

openemr

The most popular open source electronic health records and medical practice management solution.

Language:PHPLicense:GPL-3.0Stargazers:2997Issues:137Issues:2509

Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language:PythonLicense:MITStargazers:2900Issues:37Issues:198

MrDoc

mrdoc,online document system developed based on python. It is suitable for individuals and small teams to manage documents, wiki, knowledge and notes. 觅思文档,适合于个人和中小型团队的在线文档、知识库系统。

Language:JavaScriptLicense:GPL-3.0Stargazers:2897Issues:48Issues:158

swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Language:PythonLicense:Apache-2.0Stargazers:2606Issues:20Issues:726

mimic-code

MIMIC Code Repository: Code shared by the research community for the MIMIC family of databases

Language:Jupyter NotebookLicense:MITStargazers:2439Issues:137Issues:1179

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:PythonLicense:Apache-2.0Stargazers:2303Issues:41Issues:352

data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Language:PythonLicense:Apache-2.0Stargazers:1826Issues:16Issues:149

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Language:PythonLicense:NOASSERTIONStargazers:1767Issues:24Issues:172

data-centric-AI

A curated, but incomplete, list of data-centric AI resources.

Language:PythonLicense:Apache-2.0Stargazers:251Issues:11Issues:9

data_management_LLM

Collection of training data management explorations for large language models

OpenAOE

LLM Group Chat Framework: chat with multiple LLMs at the same time. 大模型群聊框架:同时与多个大语言模型聊天。

Language:TypeScriptLicense:Apache-2.0Stargazers:218Issues:6Issues:8

NeuScraper

[ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".

Language:PythonLicense:MITStargazers:199Issues:10Issues:5

Apollo

Multilingual Medicine: Model, Dataset, Benchmark, Code

Language:PythonLicense:Apache-2.0Stargazers:146Issues:10Issues:8

IEPile

[OneKE] [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus

Language:PythonLicense:NOASSERTIONStargazers:137Issues:6Issues:20

PMC-Patients

PMC-Patients

Language:PythonLicense:NOASSERTIONStargazers:77Issues:3Issues:1

RoBERTa_Encoder_Decoder_Product_Names

Define Transformers, T5 model and RoBERTa Encoder decoder model for product names generation

Language:Jupyter NotebookLicense:GPL-3.0Stargazers:45Issues:2Issues:3

HammerLLM

1.4B sLLM for Chinese and English - HammerLLM🔨

Language:PythonLicense:MITStargazers:43Issues:4Issues:1

BERT-from-Scratch-with-PyTorch

Implementation of BERT-based Language Models

Language:PythonLicense:MITStargazers:10Issues:0Issues:0