jqlang / jq

Command-line JSON processor

Home Page:https://jqlang.github.io/jq/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

@base64d fails on pdf files

pkoppstein opened this issue · comments

The good news is that replacing jq by gojq results in success.

Here's an example file:

wget https://legiscan.com/AK/text/HR2/id/1477219/Alaska-2017-HR2-Enrolled.pdf
base64 < Alaska-2017-HR2-Enrolled.pdf > /tmp/pdf.base64

Using jq:

jq -Rr '@base64d'  /tmp/pdf.base64 > /tmp/pdf.base64d.pdf        # cannot be opened by Adobe Acrobat

Replacing jq by gojq in the above line results in a file that Adobe Acrobat opens successfully.

Further details:

$ uname -a
Darwin Mac-mini.mynetworksettings.com 21.6.0 Darwin Kernel Version 21.6.0: Thu Sep 29 20:12:57 PDT 2022; root:xnu-8020.240.7~1/RELEASE_X86_64 x86_64

$ jq --version
jq-1.7.1

$ ls -l /tmp/pdf.*.pdf
-rw-r--r--  1 ....  41887 Mar  9 03:56 /tmp/pdf.base64d.gojq.pdf
-rw-r--r--  1 ....  71816 Mar  9 03:56 /tmp/pdf.base64d.pdf

Looks like dup of #1931.

Yes, @base64d can only decode to utf-8 strings, not binary data.

Note that even with gojq that can preserve non-utf8 data as long as you don't perform string operatorions on it, you should use gojq -jR @base64d <b64 >decoded not -rR or it will add an extra newline (0x0a byte) at the end of the file; for PDFs evidently that is fine though.

@emanuele6 - Thanks for the reminder about -j.

My implicit point was that if it's good enough for gojq, maybe it should be good enough for jq.
Or would "binary strings" /pull/2314 help?