Multimodal Tree Decoder for Table of Contents Extraction in Document Images

This repository contains the source code of: Multimodal Tree Decoder for Table of Contents Extraction in Document Images.

Requirements

To execute this code, it is mandatory to prepare the following:

Bert Model
Pretrained ResNet-34 weights
The proposed dataset HierDoc

The Bert Model is available here. We recommend pretraining the ResNet-34 on scientific papers with a text detection task.

Training

python runner/train_valid.py --cfg default --visual_pretrain_weights path_to_pretrained_renet34_weights

Testing

python runner/infer.py --cfg default

Citation

If you find our paper useful in your research, please consider citing:

@INPROCEEDINGS{9956301,
  author={Hu, Pengfei and Zhang, Zhenrong and Zhang, Jianshu and Du, Jun and Wu, Jiajia},
  booktitle={2022 26th International Conference on Pattern Recognition (ICPR)}, 
  title={Multimodal Tree Decoder for Table of Contents Extraction in Document Images}, 
  year={2022},
  volume={},
  number={},
  pages={1756-1762},
  doi={10.1109/ICPR56361.2022.9956301}}

About

Official PyTorch implementation of our paper "Multimodal Tree Decoder for Table of Contents Extraction in Document Images"

Languages

Language:Python 100.0%