xcxhy's starred repositories

mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Language:PythonLicense:MITStargazers:2090Issues:0Issues:0

MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

Language:PythonLicense:Apache-2.0Stargazers:3703Issues:0Issues:0

Diffree

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Language:PythonLicense:Apache-2.0Stargazers:175Issues:0Issues:0

PDF-Guru

PDF Guru Anki是一款以PDF为中心的多功能办公学习工具箱软件,包含四大板块功能:PDF实用工具箱、Anki制卡神器、Anki最强辅助、视频笔记神器,软件功能众多且强大,熟练运用可以大幅提高办公和学习效率,绝对是您不可多得的效率神器。人生苦短,我用Guru!

Language:VueLicense:AGPL-3.0Stargazers:2281Issues:0Issues:0

yolov10

YOLOv10: Real-Time End-to-End Object Detection

Language:PythonLicense:AGPL-3.0Stargazers:8862Issues:0Issues:0

YOLOv10-Document-Layout-Analysis

YOLOv10 trained on DocLayNet dataset.

Language:Jupyter NotebookLicense:AGPL-3.0Stargazers:21Issues:0Issues:0

PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

Language:PythonLicense:Apache-2.0Stargazers:12494Issues:0Issues:0

tesseract

Tesseract Open Source OCR Engine (main repository)

Language:C++License:Apache-2.0Stargazers:60332Issues:0Issues:0

AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Language:C++License:Apache-2.0Stargazers:1266Issues:0Issues:0

DUP-ocropy

Python-based tools for document analysis and OCR

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:3409Issues:0Issues:0

doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Language:PythonLicense:Apache-2.0Stargazers:3464Issues:0Issues:0

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonLicense:Apache-2.0Stargazers:14155Issues:0Issues:0

websockets

Library for building WebSocket servers and clients in Python

Language:PythonLicense:BSD-3-ClauseStargazers:5105Issues:0Issues:0

detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Language:PythonLicense:Apache-2.0Stargazers:29737Issues:0Issues:0

deepdoctection

A Repo For Document AI

Language:PythonLicense:Apache-2.0Stargazers:2429Issues:0Issues:0

MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:10549Issues:0Issues:0

unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:14310Issues:0Issues:0

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonLicense:Apache-2.0Stargazers:3575Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:29Issues:0Issues:0

reader

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

Language:TypeScriptLicense:Apache-2.0Stargazers:6052Issues:0Issues:0

llama_parse

Parse files for optimal RAG

Language:PythonLicense:MITStargazers:2250Issues:0Issues:0

llmsherpa

Developer APIs to Accelerate LLM Projects

Language:Jupyter NotebookLicense:MITStargazers:1262Issues:0Issues:0

Lumix

Pre-Processing data before pre-train and sft

Language:PythonStargazers:2Issues:0Issues:0

corenet

CoreNet: A library for training deep neural networks

Language:PythonLicense:NOASSERTIONStargazers:6879Issues:0Issues:0

pdfcrowd-python

A Python client library for the Pdfcrowd HTML to PDF API

Language:PythonLicense:MITStargazers:17Issues:0Issues:0

YOLOS

[NeurIPS 2021] You Only Look at One Sequence

Language:Jupyter NotebookLicense:MITStargazers:824Issues:0Issues:0

donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Language:PythonLicense:MITStargazers:5627Issues:0Issues:0

moondream

tiny vision language model

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:4750Issues:0Issues:0

OmniFusion

OmniFusion — a multimodal model to communicate using text and images

Language:PythonLicense:Apache-2.0Stargazers:224Issues:0Issues:0

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:PythonLicense:Apache-2.0Stargazers:2383Issues:0Issues:0