MTL-TabNet: Multi-task Learning based Model for Image-based Table Recognition

New

2023/05: Release the pretrained model on PubTabNet and FinTabNet.
2023/06: Speed up the inference time (by improving the decoding process) and reduce the memory consumption of the model.

About The Project
- Method Description
- Dependency
Getting Started
- Prerequisites
- Installation
Usage
Result
Pretrained model
License
Acknowledgements

About The Project

This project is the implementation of MTL-TabNet (Multi-task Learning based Model for Image-based Table Recognition) based on the repository of TableMASTER-mmocr (Thank you very much for your excellent works).

Method Description

The proposed model consists of one shared encoder, one shared decoder, and three separate decoders for three sub-tasks of the table recognition problem as shown in Fig. 1. The shared encoder encodes the input table image as a sequence of features. The sequence of features is passed to the shared decoder and then the structure decoder to predict a sequence of HTML tags that represent the structure of the table. When the structure decoder produces the HTML tag representing a new cell (‘’ or ‘<td ...’), the output of the shared decoder corresponding to that cell and the output of the shared encoder are passed into the cell-bbox decoder and the cell-content decoder to predict the bounding box coordinates and the text content of that cell. Finally, the text contents of cells are inserted into the HTML structure tags corresponding to their cells to produce the final HTML code of the input table image.

Dependency

Getting Started

Prerequisites

About PubTabNet, click here for downloading, and check their github and paper for the details.
About FinTabNet, click here for downloading the dataset and use extract_table_images_FinTabNet.py to extract table images and the annotation file (like PubTabNet).
About the metric TEDS, see github

Installation

Build a conda environment in Anaconda for MTL-TabNet (Optional).

# Create an environment with a Python version of 3.8.
conda create -n myenv python=3.8
conda activate myenv
# Install pytorch 1.9.0 with CUDA 11.1.
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
# Install cudnn if necessary.
conda install cudnn -c conda-forge

Install mmdetection. click here for details.

# We embed mmdetection-2.11.0 source code into this project.
# You can cd and install it (recommend).
cd ./mmdetection-2.11.0
pip install -v -e .

Install mmocr. click here for details.

# install mmocr
cd {Path to MTL-TabNet}
pip install -v -e .

Install mmcv-full-1.3.4. click here for details.

pip install mmcv-full=={mmcv_version} -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html

# install mmcv-full-1.3.4 with torch version 1.9.0 cuda_version 11.1
pip install mmcv-full==1.3.4 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html

Usage

Data preprocess

Run data_preprocess.py to get valid train data. Remember to change the 'raw_img_root' and ‘save_root’ property of PubtabnetParser to your path.

python ./table_recognition/data_preprocess.py

It will about 8 hours to finish parsing 500777 train files. After finishing the train set parsing, change the property of 'split' folder in PubtabnetParser to 'val' and get formatted val data.

Directory structure of parsed train data is :

.
├── StructureLabelAddEmptyBbox_train
│   ├── PMC1064074_007_00.txt
│   ├── PMC1064076_003_00.txt
│   ├── PMC1064076_004_00.txt
│   └── ...
├── recognition_train_img
│   ├── 0
│       ├── PMC1064100_007_00_0.png
│       ├── PMC1064100_007_00_10.png
│       ├── ...
│       └── PMC1064100_007_00_108.png
│   ├── 1
│   ├── ...
│   └── 15
├── recognition_train_txt
│   ├── 0.txt
│   ├── 1.txt
│   ├── ...
│   └── 15.txt
├── structure_alphabet.txt
└── textline_recognition_alphabet.txt

Train

Train multi-task learning based table recognition model with MTL-TabNet.

sh ./table_recognition/expr/table_recognition_dist_train.sh

Inference

To get final results.

python ./table_recognition/run_table_inference.py

run_table_inference.py will call table_inference.py and use multiple gpu devices to do model inference. Before running this script, you should change the value of cfg in table_inference.py .

Directory structure of table recognition results are:

# If you use 8 gpu devices to inference, you will get 8 detection results pickle files, one end2end_result pickle files and 8 structure recognition results pickle files. 
.
├── structure_master_caches
│   ├── structure_master_results_0.pkl
│   ├── structure_master_results_1.pkl
│   ├── ...
│   └── structure_master_results_7.pkl

Get TEDS score

Installation.

pip install -r ./table_recognition/PubTabNet-master/src/requirements.txt

Get gtVal.json.

python ./table_recognition/get_val_gt.py

Calcutate TEDS score. Before run this script, modify pred file path and gt file path in mmocr_teds_acc_mp.py
```
python ./table_recognition/PubTabNet-master/src/mmocr_teds_acc_mp.py
```

Result

TEDS score

Datasets	TEDS (%)	TEDS-struct. (%)
FinTabNet	-	98.79
PubTabNet	96.67	97.88

Pretrained Model

Pretrained model can be download in PubTabNet and FinTabNet. (Please use master_decoder_old20220923.py instead of master_decoder.py when using the pretrained model)

Demo

To run demo for recognizing a table image (you can change the input file and checkpoint file in demo.py):

python ./table_recognition/demo/demo.py

License

This project is licensed under the MIT License. See LICENSE for more details.

Citations

@article{visapp23namly,
   title={An End-to-End Multi-Task Learning Model for Image-based Table Recognition},
   author={Nam Tuan Ly and Atsuhiro Takasu},
   booktitle={Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP},
   year={2023},
   pages={626-634},
   publisher={SciTePress},
   doi={10.5220/0011685000003417},
}

Contact

Nam Ly (namly@nii.ac.jp, namlytuan@gmail.com)
Atsuhiro Takasu (takasu@nii.ac.jp)

fireae / MTL-TabNet