simonw / s3-ocr

Tools for running OCR against files stored in S3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support files other than PDFs

simonw opened this issue · comments

The tool only handles PDFs right now, but AWS Textract can handle other formats (including regular images).

I've not tried it yet, but I have a hunch that this will work against various other image files right now with no changes - it's just the --all option that's limited to PDFs, the option where you list files explicitly by name should work already.