Why can't I search the contents of PDF files?
zhl111 opened this issue · comments
阿哈吉 commented
阿哈吉 commented
Here is a detailed explanation of the problem
#47 (comment)
Joseph LaFreniere commented
It looks like rga
is finding matching lines in your PDF and the failure occurs when printing those matches because they contain non-UTF-8 bytestrings. AFAIK rga
will transparently re-encode UTF-16 (but no other encodings) to UTF-8, so it's likely that
- the match contains something that's neither valid UTF-16 nor valid UTF-8,
rga
is attempting to print that match verbatim, then- rust's stdlib chokes when attempting to print the non-UTF-8 byte strings.