Why can't I search the contents of PDF files?

Question

Why can't I search the contents of PDF files?

zhl111 opened this issue 2 years ago · comments

阿哈吉 commented 2 years ago

I don't know much about programming, please tell me how to do it
1.
2.

阿哈吉 · Answer 1 · Sun Aug 21 2022 00:46:01 GMT+0800 (China Standard Time)

Here is a detailed explanation of the problem
#47 (comment)

Joseph LaFreniere · Answer 2 · Tue Nov 21 2023 06:43:00 GMT+0800 (China Standard Time)

It looks like rga is finding matching lines in your PDF and the failure occurs when printing those matches because they contain non-UTF-8 bytestrings. AFAIK rga will transparently re-encode UTF-16 (but no other encodings) to UTF-8, so it's likely that

the match contains something that's neither valid UTF-16 nor valid UTF-8,
rga is attempting to print that match verbatim, then
rust's stdlib chokes when attempting to print the non-UTF-8 byte strings.

phiresky · Answer 3 · Tue Jan 16 2024 08:31:41 GMT+0800 (China Standard Time)

I'll assume this is a dupe of #47