simon987 / sist2

Lightning-fast file system indexer and search tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

extra spaces in result when ocr chinese

ffchung opened this issue · comments

Which SIST2 component is your Feature Request related to?

Scan with ocr image and ocr ebook

Is your feature request related to a problem? Please describe.

Ref to : tesseract-ocr/tesseract#991

I need to pass the setting preserve_interword_spaces=1 to tesseract.

What would you like to see happen?

chinese ocr with extra spaces.

Additional context

Thanks,

Fixed in 2936240

image

Before:

伦敦 楼 房 发 生火 灾 中 使 馆 关 注 : 暂 无 中 国 公民 受伤

After:

伦敦楼房发生火灾中使馆关注 : 暂无**公民受伤