pbernet/streaming-ocr

PoC for parallelization

Additional features of this fork:

Bumped libs
PoC for parallelization. Spoiler: Does not speed up things
HTTP upload server as separate entry point
Assuming the ocr contains medical content: upload DocumentReference to FHIR R4 server: http://hapi.fhir.org/baseR4

Tmp removed feature:

The two main classes may be started directly from the IntelliJ IDE. GraalVM was used to run it locally on Mac.

brew install tesseract

And then I set an Environment Var LC_ALL=C in IntelliJ Run Configuration

Didn't need this env var in VSCode

sbt run

Open http://localhost:8080 and upload an image.

Or, using cURL:

curl -X POST -F 'fileUpload=@/Users/duanebester/Documents/input.jpg' 'http://localhost:8080/image/ocr'

A look at OCR within Akka Streams

Language:Scala 56.8%Language:Java 31.3%Language:HTML 11.9%