dongwandou's starred repositories

MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。

Language:PythonLicense:AGPL-3.0Stargazers:1319Issues:0Issues:0

PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Language:PythonLicense:Apache-2.0Stargazers:3095Issues:0Issues:0

JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

Language:PythonLicense:Apache-2.0Stargazers:3159Issues:0Issues:0

PyMuPDF-Utilities

Demos, examples and utilities using PyMuPDF

Language:Jupyter NotebookLicense:AGPL-3.0Stargazers:515Issues:0Issues:0

PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Language:PythonLicense:AGPL-3.0Stargazers:4682Issues:0Issues:0

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:23982Issues:0Issues:0

camelot

A Python library to extract tabular data from PDFs

Language:PythonLicense:MITStargazers:2839Issues:0Issues:0

mupdf

mupdf mirror

Language:CLicense:AGPL-3.0Stargazers:1292Issues:0Issues:0

pdf2docx

Open source Python library for converting PDF to DOCX.

Language:PythonLicense:AGPL-3.0Stargazers:2348Issues:0Issues:0

camelot

Camelot: PDF Table Extraction for Humans

Language:PythonLicense:NOASSERTIONStargazers:3617Issues:0Issues:0

llama_index

LlamaIndex is a data framework for your LLM applications

Language:PythonLicense:MITStargazers:33883Issues:0Issues:0

Awesome-LLM-RAG-Application

the resources about the application based on LLM with RAG pattern

Stargazers:604Issues:0Issues:0

MyScaleDB

An open-source, high-performance SQL vector database built on ClickHouse.

Language:C++License:Apache-2.0Stargazers:776Issues:0Issues:0

QAnything

Question and Answer based on Anything.

Language:PythonLicense:Apache-2.0Stargazers:10801Issues:0Issues:0

kimi-free-api

🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长:长文本解读整理】,支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话,零配置部署,多路token支持,自动清理会话痕迹。

Language:TypeScriptLicense:GPL-3.0Stargazers:3457Issues:0Issues:0

Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain

Language:TypeScriptLicense:Apache-2.0Stargazers:30126Issues:0Issues:0

DISC-FinLLM

DISC-FinLLM,中文金融大语言模型(LLM),旨在为用户提供金融场景下专业、智能、全面的金融咨询服务。DISC-FinLLM, a Chinese financial large language model (LLM) designed to provide users with professional, intelligent, and comprehensive financial consulting services in financial scenarios.

Language:PythonLicense:Apache-2.0Stargazers:522Issues:0Issues:0

Awesome-Chinese-LLM

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

Stargazers:13650Issues:0Issues:0

Semantic-Retrieval-Models

A curated list of awesome papers for Semantic Retrieval (TOIS Accepted: Semantic Models for the First-stage Retrieval: A Comprehensive Review).

Stargazers:312Issues:0Issues:0

DCN

Dynamic Connected Networks for Chinese Spelling Check

Language:PythonLicense:Apache-2.0Stargazers:48Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:8056Issues:0Issues:0

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:PythonLicense:Apache-2.0Stargazers:4458Issues:0Issues:0

PERT

PERT: Pre-training BERT with Permuted Language Model

License:Apache-2.0Stargazers:346Issues:0Issues:0

HFL-Anthology

Collections of resources from Joint Laboratory of HIT and iFLYTEK Research (HFL)

Language:MarkdownLicense:CC-BY-SA-4.0Stargazers:354Issues:0Issues:0

MacBERT

Revisiting Pre-trained Models for Chinese Natural Language Processing (MacBERT)

License:Apache-2.0Stargazers:624Issues:0Issues:0

Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

Language:PythonLicense:Apache-2.0Stargazers:7012Issues:0Issues:0

Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Language:PythonLicense:Apache-2.0Stargazers:17965Issues:0Issues:0

ConfusionCluster

An Analysis Tool to Models for Chinese Spell Checking Released on ACL2023.

Language:PythonStargazers:5Issues:0Issues:0

Administrative-divisions-of-China

中华人民共和国行政区划:省级(省份)、 地级(城市)、 县级(区县)、 乡级(乡镇街道)、 村级(村委会居委会) ,**省市区镇村二级三级四级五级联动地址数据。

Language:JavaScriptLicense:WTFPLStargazers:18170Issues:0Issues:0
Language:PythonStargazers:6Issues:0Issues:0