JingyeChen / MultiplexedOCR

Code for CVPR21 paper A Multiplexed Network for End-to-End, Multilingual OCR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multiplexer

Installation

Requirements:

conda create --name multiplexer
conda activate multiplexer
# sudo apt install nvidia-cuda-toolkit  # install nvcc if it's not already there
pip install yacs==0.1.8  # Note: conda only has yacs v0.1.6 now
pip install numpy
pip install opencv-python
# run `nvcc --version` to decide the cudatoolkit version

conda install pytorch torchvision torchaudio cudatoolkit=10.1 -c pytorch
pip install pyclipper
conda install shapely
conda install -c conda-forge pycocotools
conda install -c conda-forge ftfy
pip install tensorboard
pip install submitit
pip install tqdm
pip install editdistance
pip install scipy
pip install black
pip install isort==5.9.3
pip install flake8==3.9.2

python setup.py build_ext install
# Note: if the above command doesn't work,
# you can try the following depending on your 
# CUDA/nvcc/gcc compatibility versions and locations
# See https://stackoverflow.com/a/46380601
# For example, if `nvcc --version` says version 10.1, you can use/install g++-8 if it's not there
sudo apt install g++-8

python setup.py build develop

Demo

Activate the multiplexer environment if you haven't already done so:

conda init bash
source ~/.bashrc
conda activate multiplexer

Then, you can run the demo script for a single image inference by

weight=YOUR_PATH_TO_WEIGHT_FILE
img=YOUR_PATH_TO_IMAGE_FILE

python -m demo.demo \
--config-file configs/demo.yaml \
--input $img \
--output /tmp/multiplexer \
MODEL.WEIGHT $weight

Training

Activate the multiplexer environment if you haven't already done so:

conda init bash
source ~/.bashrc
conda activate multiplexer

Then, modify the yaml file and run the training/finetuning

yaml=PATH_TO_YAML_FILE
python3 tools/train_net.py --config-file $yaml

Relationship to Mask TextSpotter v3

This project is under a lincense of Creative Commons Attribution-NonCommercial 4.0 International. Part of the code is inherited from Mask TextSpotter v3, which is under the same license.

Citing Multiplexer

If you use Multiplexer in your research or wish to refer to the baseline results, please use the following BibTeX entry.

@inproceedings{huang2021multiplexed,
  title={A multiplexed network for end-to-end, multilingual ocr},
  author={Huang, Jing and Pang, Guan and Kovvuri, Rama and Toh, Mandy and Liang, Kevin J and Krishnan, Praveen and Yin, Xi and Hassner, Tal},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4547--4557},
  year={2021}
}

About

Code for CVPR21 paper A Multiplexed Network for End-to-End, Multilingual OCR

License:Other


Languages

Language:Python 96.7%Language:C++ 1.5%Language:Cuda 1.5%Language:C 0.2%Language:Shell 0.1%