phiresky / ripgrep-all

rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why can't I search the contents of PDF files?

zhl111 opened this issue · comments

I don't know much about programming, please tell me how to do it
1.image
2.
image

Here is a detailed explanation of the problem
#47 (comment)
VeryCapture_20220821011920

It looks like rga is finding matching lines in your PDF and the failure occurs when printing those matches because they contain non-UTF-8 bytestrings. AFAIK rga will transparently re-encode UTF-16 (but no other encodings) to UTF-8, so it's likely that

  1. the match contains something that's neither valid UTF-16 nor valid UTF-8,
  2. rga is attempting to print that match verbatim, then
  3. rust's stdlib chokes when attempting to print the non-UTF-8 byte strings.

I'll assume this is a dupe of #47