amtam0 / lambda-tesseract-api

Hybrid serverless OCR using Tesseract 5 and PaddleOCR (AWS lambda)

Home Page:https://amtam0.github.io/lambda-tesseract-api/webapp/app.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fast setup of OCR lambda function using Tesseract 5 and a custom OCR (here we use PaddleOCR ONNX version)

Setup:

  • clone repo

  • create ECR repo in your AWS / copy its URI and add it to zip_fct.sh #line 27/28

  • connect if not done aws ecr get-login-password --region yourREGION | docker login --username AWS --password-stdin yourURI

  • run cd lambda-tesseract-api/; bash zip_fct.sh

Done ! Your ECR image is ready to be uploaded from your lambda function (you can use the example.json to test it).

Notes :

  • Docker must be installed, tested in Ubuntu 20.04.
  • Here we do only the Recognition part, You can edit OCR fcts in lambda_function.py for your needs.

Check Medium link to setup lambda and Api in AWS console. Not updated (the lambda setup is easier now, you only need to upload the Image from ECR).

References

About

Hybrid serverless OCR using Tesseract 5 and PaddleOCR (AWS lambda)

https://amtam0.github.io/lambda-tesseract-api/webapp/app.html

License:MIT License


Languages

Language:Python 72.8%Language:HTML 20.4%Language:Dockerfile 4.1%Language:Shell 2.7%