simonw / s3-ocr

Tools for running OCR against files stored in S3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

s3-ocr fetch command to fetch OCR results

simonw opened this issue · comments

s3-ocr fetch name-of-bucket path/to/file.pdf

This will download the relevant OCR result files to the current directory.

It will first lookup the job ID associated with the file, then save textract-output/a806e67e504fc15f8b9d61d9e8e99f2b329a93410d1859a6fb4c7ba37a48314e/1 as a806e67e504fc15f8b9d61d9e8e99f2b329a93410d1859a6fb4c7ba37a48314e-1.json (and -2.json and so on).

A -c or --combine output.json option will combine them into a single file on disk instead.

--combine - will output to standard output.