Scene Text Recognition

Real-time scene text recognition accelerated with NVIDIA TensorRT

ocr_demo_pringles_and_ruler_hstacked.mp4

Quickstart

Clone Repo

git clone --recursive git@github.com:tomek-l/nv-scene-text-recognition.git

Install pytorch, torchvision

wget https://raw.githubusercontent.com/tomek-l/jetson-install-pytorch/master/install_torch_v1.9.sh 
bash install_torch_v1.9.sh
pip3 install -r requirements.txt

Install torch2trt

Until this PR is merged use Chitoku's branch containing a fix for TensorRT 8.

cd torch2trt 
sudo python3 setup.py install --plugins

Install easyOCR

cd EasyOCR
sudo python3 setup.py install

Dockerfile

Make sure docker is setup correctly on the jetson as directed here. Specifically, read the "Docker Default Runtime" section and make sure Nvidia is the default docker runtime daemon.

Build the dockerfile

docker build -t scene-text-recognition .

Run the dockerfile

sudo docker run -it --rm -v ~/workdir:/workdir/ --runtime nvidia --network host scene-text-recognition

where workdir is the directory contianing this cloned repo, or is the clone repo.

If you are using a realtime camera:

xhost +
sudo docker run -it --rm -v ~/workdir:/workdir/ --runtime nvidia --network host -e DISPLAY=$DISPLAY --device /dev/video0: dev/video0 scene-text-recognition

Where video0 is correct device id into the container. This can be found using:

ls /dev/video*

Step 3 - Run the example files

There are three separate demo files included:

1. easy_ocr_demo.py

This program uses EasyOCR to read an image or directory of images and output labeled images. The output is in the labeled-images/ directory

To use easy_ocr_demo:

python3 easy_ocr_demo.py images

where images is an image file or directory of images.

2. easy_ocr_benchmark.py

Using the pretrained EasyOCR detection and recognition models, we benchmark the throughput and latency and show the speedup after it is converted to a TensorRT engine (TRT) on the Jetson AGX Xavier.

Model	Throughput (fps)	Latency (ms)
Detection	12.386	84.190
Detection TRT	24.737	48.990
Recognition	174.518	5.900
Recognition TRT	7118.642	0.160

To run this benchmark:

python3 easy_ocr_benchmark.py

This program will store the Torch2trt state dictionaries in the torch2trt_models dictionary.

3. video_capture.py

This program uses an attached USB camera to display a realtime video. The code will display bounding boxes around the text in the video and output the text in the terminal. Click on the video screen and type 'q' to terminate the program. After plugging in the USB camera, but before running the python file, check the device id, and make sure that is passed into 'cap = cv2.VideoCapture(0)' line. By defualt we assume it is zero, change the argument in cv2.Videcapture() to the correct device id before running the program. The deviced id can be checked by doing:

ls /dev/video*

To run the program:

python3 video_capture.py

Step 4 - Write your own code

The easyocr package can be called and used mostly as described in the EasyOCR repo. This repo, however, also adds the use_trt flag to the reader class. Setting use_trt = True, will convert the models to tensorRT or use the converted and locally stored models, when performing detection.

Example code:

import easyocr
reader = easyocr.Reader(['en'], use_trt=True)
result = reader.readtext('path/to/image.png')
print("TensorRT Optimized Result",result, '\n')

More:

Different Models

The code is designed to be able to swap in and out various detection models. As an example, view detect.py file to see where the EAST detection model was substituted in.

Custom Training

To train and run your own models please see the EasyOCR instructions

References

The scene text recogntion framework used here is a modified version of the EasyOCR open-source code EasyOCR.

Below are the sources of the default detection and recogntion models:

Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. (2019). Character region awareness for text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9365-9374).
Shi, B., Bai, X., & Yao, C. (2016). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 39(11), 2298-2304.

Licenses

This code is licensed under The MIT License as described here. The submodule EasyOCR is licesned under Apache License 2.0 as described here.

About

Other

Languages

Language:Python 90.8%Language:Dockerfile 9.2%

beosro / scene-text-recognition