xcxhy

followers

following

stars

xcxhy's starred repositories

mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Language:PythonMIT209000

MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

Language:PythonApache-2.0370300

Diffree

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Language:PythonApache-2.017500

PDF-Guru

PDF Guru Anki是一款以PDF为中心的多功能办公学习工具箱软件，包含四大板块功能：PDF实用工具箱、Anki制卡神器、Anki最强辅助、视频笔记神器，软件功能众多且强大，熟练运用可以大幅提高办公和学习效率，绝对是您不可多得的效率神器。人生苦短，我用Guru!

Language:VueAGPL-3.0228100

yolov10

YOLOv10: Real-Time End-to-End Object Detection

Language:PythonAGPL-3.0886200

YOLOv10-Document-Layout-Analysis

YOLOv10 trained on DocLayNet dataset.

Language:Jupyter NotebookAGPL-3.02100

PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

Language:PythonApache-2.01249400

tesseract

Tesseract Open Source OCR Engine (main repository)

Language:C++Apache-2.06033200

AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Language:C++Apache-2.0126600

DUP-ocropy

Python-based tools for document analysis and OCR

Language:Jupyter NotebookApache-2.0340900

doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Language:PythonApache-2.0346400

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonApache-2.01415500

websockets

Library for building WebSocket servers and clients in Python

Language:PythonBSD-3-Clause510500

detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Language:PythonApache-2.02973700

deepdoctection

A Repo For Document AI

Language:PythonApache-2.0242900

MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Language:PythonApache-2.01054900

unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonApache-2.01431000

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonApache-2.0357500

PosterLLaVA

Language:PythonApache-2.02900

reader

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

Language:TypeScriptApache-2.0605200

llama_parse

Parse files for optimal RAG

Language:PythonMIT225000

llmsherpa

Developer APIs to Accelerate LLM Projects

Language:Jupyter NotebookMIT126200

Lumix

Pre-Processing data before pre-train and sft

Language:Python200

corenet

CoreNet: A library for training deep neural networks

Language:PythonNOASSERTION687900

pdfcrowd-python

A Python client library for the Pdfcrowd HTML to PDF API

Language:PythonMIT1700

YOLOS

[NeurIPS 2021] You Only Look at One Sequence

Language:Jupyter NotebookMIT82400

donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Language:PythonMIT562700

moondream

tiny vision language model

Language:Jupyter NotebookApache-2.0475000

OmniFusion

OmniFusion — a multimodal model to communicate using text and images

Language:PythonApache-2.022400

InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Language:PythonApache-2.0238300