@base64d fails on pdf files
pkoppstein opened this issue · comments
The good news is that replacing jq
by gojq
results in success.
Here's an example file:
wget https://legiscan.com/AK/text/HR2/id/1477219/Alaska-2017-HR2-Enrolled.pdf
base64 < Alaska-2017-HR2-Enrolled.pdf > /tmp/pdf.base64
Using jq:
jq -Rr '@base64d' /tmp/pdf.base64 > /tmp/pdf.base64d.pdf # cannot be opened by Adobe Acrobat
Replacing jq by gojq in the above line results in a file that Adobe Acrobat opens successfully.
Further details:
$ uname -a
Darwin Mac-mini.mynetworksettings.com 21.6.0 Darwin Kernel Version 21.6.0: Thu Sep 29 20:12:57 PDT 2022; root:xnu-8020.240.7~1/RELEASE_X86_64 x86_64
$ jq --version
jq-1.7.1
$ ls -l /tmp/pdf.*.pdf
-rw-r--r-- 1 .... 41887 Mar 9 03:56 /tmp/pdf.base64d.gojq.pdf
-rw-r--r-- 1 .... 71816 Mar 9 03:56 /tmp/pdf.base64d.pdf
Yes, @base64d
can only decode to utf-8 strings, not binary data.
Note that even with gojq
that can preserve non-utf8 data as long as you don't perform string operatorions on it, you should use gojq -jR @base64d <b64 >decoded
not -rR
or it will add an extra newline (0x0a
byte) at the end of the file; for PDFs evidently that is fine though.
@emanuele6 - Thanks for the reminder about -j
.
My implicit point was that if it's good enough for gojq, maybe it should be good enough for jq.
Or would "binary strings" /pull/2314 help?