Wilfred / difftastic

a structural diff that understands syntax 🟥🟩

Home Page:https://difftastic.wilfred.me.uk/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Binary files being displayed as lines of text

blackfawn opened this issue · comments

Some binary files, which happen to be shared objects in LFS, are displayed as "Text" instead of "Binary" with difftastic. This makes using difftastic very difficult as it will show endless lines of binary as changed lines of text. Running file -i $FILE on these files returns application/x-sharedlib; charset=binary so they are being properly identified.

I see difftastic only flags PDFs and ZIPs as binary within the application/* MIME/media types: https://github.com/Wilfred/difftastic/blob/master/src/files.rs#L162-L167

I believe a large majority of application/* types would be expected to be binary files, so perhaps it might make sense to invert the application/* logic and treat anything application/* as binary that isn't specifically application/json or application/ld+json (and perhaps a few others)

Alternatively, the first 4 bytes could be checked and anything that is 0x7f 0x45 0x4c 0x46 (the "magic number" present at the start of an ELF header) could at least be treated as binary to take care of ELF files.

difft --version
Difftastic 0.56.1 (built with rustc 1.70.0)

Yeah, being stricter about application/ seems like a reasonable heuristic.

Difftastic also tries to guess whether a file looks like valid text by trying to decode it as UTF-8 and UTF-16, which should catch cases like these. Could you share a sample file, or set the environment variable export DFT_LOG=trace and report what it says?

This was probably fixed in c6da857, but feel free to reopen if it reoccurs.