No matter how it is used, the parsing is garbled. And the document can still be viewed
snoopy83101 opened this issue · comments
snoopy83101 commented
help!
This PDF document (China's general invoice) can be viewed, but it is always garbled when parsed.
Can you help me?
Alexander Shtuchkin commented
not sure how I can help. You seem to be using iconv-lite correctly, maybe something with the pdf parser library?
snoopy83101 commented
Perhaps the problem is with the PDF parser, but unfortunately, the PDF parser does not parse all PDFs. About 50% of Chinese invoices cannot be parsed by a PDF parser.
So I can only use PDF2IMG and then OCR, so I don’t need to consider the text encoding.