pbernet / streaming-ocr

A look at OCR within Akka Streams

Home Page:https://towardsdatascience.com/ocr-with-akka-tesseract-and-javacv-part-1-702781fc73ca

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PoC for parallelization

Additional features of this fork:

  • Bumped libs
  • PoC for parallelization. Spoiler: Does not speed up things
  • HTTP upload server as separate entry point
  • Assuming the ocr contains medical content: upload DocumentReference to FHIR R4 server: http://hapi.fhir.org/baseR4

Tmp removed feature:

  • Date extraction via Natty parser

The two main classes may be started directly from the IntelliJ IDE. GraalVM was used to run it locally on Mac.

Search uploaded content on public FHIR server, eg http://hapi.fhir.org/baseR4/DocumentReference/1845301/_history/1

Up and running | Mac OS

OCR with Akka, Tesseract, and JavaCV | Part 1 OCR with Akka, Tesseract, and JavaCV | Part 2

brew install tesseract

And then I set an Environment Var LC_ALL=C in IntelliJ Run Configuration

Didn't need this env var in VSCode

sbt run

Open http://localhost:8080 and upload an image.

Or, using cURL:

curl -X POST -F 'fileUpload=@/Users/duanebester/Documents/input.jpg' 'http://localhost:8080/image/ocr'

About

A look at OCR within Akka Streams

https://towardsdatascience.com/ocr-with-akka-tesseract-and-javacv-part-1-702781fc73ca


Languages

Language:Scala 56.8%Language:Java 31.3%Language:HTML 11.9%