tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)

Home Page:https://tesseract-ocr.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

read_params_file: Can't open hocr

tommedema opened this issue · comments

Current Behavior

brew install tesseract
git clone git@github.com:tesseract-ocr/tessdata_fast.git
tesseract screenshot1.png outputbase --tessdata-dir ./tessdata_fast --oem 1 --psm 12 -l eng hocr
read_params_file: Can't open hocr

Expected Behavior

brew install tesseract
git clone git@github.com:tesseract-ocr/tessdata_fast.git
tesseract screenshot1.png outputbase --tessdata-dir ./tessdata_fast --oem 1 --psm 12 -l eng hocr

Suggested Fix

explain how above command line parameters can be used while also specifying hocr output format

tesseract -v

tesseract -v
tesseract 5.3.3
leptonica-1.84.1
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.40 : libtiff 4.6.0 : zlib 1.2.12 : libwebp 1.3.2 : libopenjp2 2.5.0
Found NEON
Found libarchive 3.7.2 zlib/1.2.12 liblzma/5.4.4 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.5
Found libcurl/8.4.0 SecureTransport (LibreSSL/3.3.6) zlib/1.2.12 nghttp2/1.55.1

Operating System

macOS 14 Sonoma

Other Operating System

No response

uname -a

Darwin Toms-MacBook-Air.local 23.2.0 Darwin Kernel Version 23.2.0: Wed Nov 15 21:59:33 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8112 arm64

Compiler

No response

CPU

Model Name: MacBook Air
Model Identifier: Mac14,15
Model Number: Z18T000PMLL/A
Chip: Apple M2
Total Number of Cores: 8 (4 performance and 4 efficiency)

Virtualization / Containers

No response

Other Information

No response

turns out the hocr is a configfile that needs to be downloaded as well:

git clone --recurse-submodules --remote-submodules git@github.com:tesseract-ocr/tessdata_fast.git