berooo's repositories

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

License:MITStargazers:0Issues:0Issues:0

awesome-document-understanding

A curated list of resources for Document Understanding (DU) topic

Stargazers:0Issues:0Issues:0

baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

License:Apache-2.0Stargazers:0Issues:0Issues:0

CAN

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition (ECCV’2022 Poster).

License:MITStargazers:0Issues:0Issues:0

ChineseNLPCorpus

中文自然语言处理数据集,平时做做实验的材料。欢迎补充提交合并。

Stargazers:0Issues:0Issues:0

cord

CORD: A Consolidated Receipt Dataset for Post-OCR Parsing

License:CC-BY-4.0Stargazers:0Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

License:Apache-2.0Stargazers:0Issues:0Issues:0

DocBank

DocBank: A Benchmark Dataset for Document Layout Analysis

License:Apache-2.0Stargazers:0Issues:0Issues:0

EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

License:Apache-2.0Stargazers:0Issues:0Issues:0

ERNIE-Layout-Pytorch

An unofficial Pytorch implementation of ERNIE-Layout which is originally released through PaddleNLP.

License:MITStargazers:0Issues:0Issues:0

GitHub520

:kissing_heart: 让你“爱”上 GitHub,解决访问时图裂、加载慢的问题。(无需安装)

Stargazers:0Issues:0Issues:0

GPT2-Chinese

Chinese version of GPT2 training code, using BERT tokenizer.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
License:MITStargazers:0Issues:0Issues:0

insightface

State-of-the-art 2D and 3D Face Analysis Project

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

License:MITStargazers:0Issues:0Issues:0

layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis

License:Apache-2.0Stargazers:0Issues:0Issues:0

LiLT

Official PyTorch implementation of LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding (ACL 2022)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

Stargazers:0Issues:0Issues:0

ml-cvnets

CVNets: A library for training computer vision networks

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

License:MITStargazers:0Issues:0Issues:0

nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

License:MITStargazers:0Issues:0Issues:0

open-llms

📋 A list of open LLMs available for commercial use.

License:Apache-2.0Stargazers:0Issues:0Issues:0

open-mllms

open llm for multimodal

License:Apache-2.0Stargazers:0Issues:0Issues:0

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

License:Apache-2.0Stargazers:0Issues:0Issues:0

source-han-sans

Source Han Sans | 思源黑体 | 思源黑體 | 思源黑體 香港 | 源ノ角ゴシック | 본고딕

License:NOASSERTIONStargazers:0Issues:0Issues:0

TabRecSet

A large scale camera-taken table detection and recognition dataset.

Stargazers:0Issues:0Issues:0

tabula

Tabula is a tool for liberating data tables trapped inside PDF files

License:MITStargazers:0Issues:0Issues:0

UIE

Unified Structure Generation for Universal Information Extraction

Language:PythonStargazers:0Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

License:Apache-2.0Stargazers:0Issues:0Issues:0

WanJuan1.0

万卷1.0多模态语料

License:CC-BY-4.0Stargazers:0Issues:0Issues:0