@base64d fails on pdf files

Question

@base64d fails on pdf files

pkoppstein opened this issue 3 months ago · comments

The good news is that replacing jq by gojq results in success.

Here's an example file:

wget https://legiscan.com/AK/text/HR2/id/1477219/Alaska-2017-HR2-Enrolled.pdf
base64 < Alaska-2017-HR2-Enrolled.pdf > /tmp/pdf.base64

Using jq:

jq -Rr '@base64d'  /tmp/pdf.base64 > /tmp/pdf.base64d.pdf        # cannot be opened by Adobe Acrobat

Replacing jq by gojq in the above line results in a file that Adobe Acrobat opens successfully.

Further details:

$ uname -a
Darwin Mac-mini.mynetworksettings.com 21.6.0 Darwin Kernel Version 21.6.0: Thu Sep 29 20:12:57 PDT 2022; root:xnu-8020.240.7~1/RELEASE_X86_64 x86_64

$ jq --version
jq-1.7.1

$ ls -l /tmp/pdf.*.pdf
-rw-r--r--  1 ....  41887 Mar  9 03:56 /tmp/pdf.base64d.gojq.pdf
-rw-r--r--  1 ....  71816 Mar  9 03:56 /tmp/pdf.base64d.pdf

itchyny · Answer 1 · Sat Mar 09 2024 17:10:02 GMT+0800 (China Standard Time)

Looks like dup of #1931.

Emanuele Torre · Answer 2 · Sat Mar 09 2024 18:36:25 GMT+0800 (China Standard Time)

Yes, @base64d can only decode to utf-8 strings, not binary data.

Note that even with gojq that can preserve non-utf8 data as long as you don't perform string operatorions on it, you should use gojq -jR @base64d <b64 >decoded not -rR or it will add an extra newline (0x0a byte) at the end of the file; for PDFs evidently that is fine though.

pkoppstein · Answer 3 · Sat Mar 09 2024 21:44:56 GMT+0800 (China Standard Time)

@emanuele6 - Thanks for the reminder about -j.

My implicit point was that if it's good enough for gojq, maybe it should be good enough for jq.
Or would "binary strings" /pull/2314 help?