Binary files being displayed as lines of text
blackfawn opened this issue · comments
Some binary files, which happen to be shared objects in LFS, are displayed as "Text" instead of "Binary" with difftastic. This makes using difftastic very difficult as it will show endless lines of binary as changed lines of text. Running file -i $FILE
on these files returns application/x-sharedlib; charset=binary
so they are being properly identified.
I see difftastic only flags PDFs and ZIPs as binary within the application/* MIME/media types: https://github.com/Wilfred/difftastic/blob/master/src/files.rs#L162-L167
I believe a large majority of application/* types would be expected to be binary files, so perhaps it might make sense to invert the application/* logic and treat anything application/* as binary that isn't specifically application/json or application/ld+json (and perhaps a few others)
Alternatively, the first 4 bytes could be checked and anything that is 0x7f 0x45 0x4c 0x46
(the "magic number" present at the start of an ELF header) could at least be treated as binary to take care of ELF files.
difft --version
Difftastic 0.56.1 (built with rustc 1.70.0)
Yeah, being stricter about application/
seems like a reasonable heuristic.
Difftastic also tries to guess whether a file looks like valid text by trying to decode it as UTF-8 and UTF-16, which should catch cases like these. Could you share a sample file, or set the environment variable export DFT_LOG=trace
and report what it says?
This was probably fixed in c6da857, but feel free to reopen if it reoccurs.