amtam0/lambda-tesseract-api

Fast setup of OCR lambda function using Tesseract 5 and a custom OCR (here we use PaddleOCR ONNX version)

clone repo
create ECR repo in your AWS / copy its URI and add it to zip_fct.sh #line 27/28
connect if not done aws ecr get-login-password --region yourREGION | docker login --username AWS --password-stdin yourURI
run cd lambda-tesseract-api/; bash zip_fct.sh

Done ! Your ECR image is ready to be uploaded from your lambda function (you can use the example.json to test it).

Notes :

Docker must be installed, tested in Ubuntu 20.04.
Here we do only the Recognition part, You can edit OCR fcts in lambda_function.py for your needs.

Check Medium link to setup lambda and Api in AWS console. Not updated (the lambda setup is easier now, you only need to upload the Image from ECR).

Hybrid serverless OCR using Tesseract 5 and PaddleOCR (AWS lambda)

MIT License

Language:Python 72.8%Language:HTML 20.4%Language:Dockerfile 4.1%Language:Shell 2.7%