avr248 / awesome-key-information-extraction

A curated list of papers about key information extraction.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Awesome Key Infomation Extraction


A curated list of papers about key information extraction.

Paperswithcode links will be preferred.

Welcome contributions!

Tabel of Contents


Name Title Links
DUE DUE: End-to-End Document Understanding Benchmark [link]
RVL-CDIP Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval [link][download]
SROIE ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction [link][download]
FUNSD FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents [link][download]
XFUND XFUND: A Multilingual Form Understanding Benchmark [link]
CORD CORD: A Consolidated Receipt Dataset for Post-OCR Parsing [link]
EPHOIE Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution [link]
EATEN EATEN: Entity-aware Attention for Single Shot Visual Text Extraction [link]
Train Ticket PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks [link][download]
POIE Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution [link][download]


Year Title Links
2023 On the Hidden Mystery of OCR in Large Multimodal Models [link]
2021 Document AI: Benchmarks, Models and Applications [link]


Year Title Links
2022 DavarOCR: A Toolbox for OCR and Multi-Modal Document Understanding [paper][code]
2021 MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding [paper][code]
2020 PP-OCR: A Practical Ultra Lightweight OCR System [paper][code]



Pub. Year Title Links
ICML 2023 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models [link]
Arxiv 2023 InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning [link]
Arxiv 2023 MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models [link]
Arxiv 2023 Visual Instruction Tuning [link]
Arxiv 2023 Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond [link]
Arxiv 2023 mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality [link]
Arxiv 2023 mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding [link]
Arxiv 2023 mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration [link]
Arxiv 2023 Otter: A Multi-Modal Model with In-Context Instruction Tuning [link]
Arxiv 2023 UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model [link]
Blog 2023 Fuyu-8B: A Multimodal Architecture for AI Agents [blog][model]


Pub. Year Title Links
ICDAR 2023 LayoutGCN: A Lightweight Architecture for Visually Rich Document Understanding [paper]
ACL-Findings 2021 Spatial Dependency Parsing for Semi-Structured Document Information Extraction [link]
Arxiv 2021 Spatial Dual-Modality Graph Reasoning for Key Information Extraction [link]
ICPR 2020 PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks [link]


Pub. Year Title Links
ACL 2022 LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding [link]
ACL 2022 FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction [link]
CVPR 2022 XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding [link]
Arxiv 2022 LoPE: Learnable Sinusoidal Positional Encoding for Improving Document Transformer Model [link]
Arxiv 2022 LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking [link]
Arxiv 2022 ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding [link]
AAAI 2022 BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents [link]
ICDAR 2021 ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents [link][code]
Arxiv 2021 TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models [link]
ACM-MM 2021 StrucTexT: Structured Text Understanding with Multi-Modal Transformers [link]
ACL 2021 LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding [link]
KDD 2020 LayoutLM: Pre-training of Text and Layout for Document Image Understanding [link]


Pub. Year Title Links
ICDAR 2021 ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents [link]
ICDAR 2021 VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach [link]
NIPS 2019 BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding [link]
EMNLP 2018 Chargrid: Towards Understanding 2D Documents [link]


Pub. Year Title Links
ICDAR 2023 Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution [link]
ICML 2023 Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding [link]
ECCV 2022 OCR-free Document Understanding Transformer [link]
Arxiv 2022 TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents [link]
ICCV 2021 DocFormer: End-to-End Transformer for Document Understanding [link]
ACM-MM 2020 TRIE: End-to-End Text Reading and Information Extraction for Document Understanding [link]
ICDAR 2019 EATEN: Entity-aware Attention for Single Shot Visual Text Extraction [link]


Pub. Year Title Links
ICDAR 2023 Information Extraction from Documents: Question Answering vs Token Classification in real-world setups [link]



A curated list of papers about key information extraction.