sharkdp / bat

A cat(1) clone with wings.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add Flag to Avoid Treating NUL Separated Input as Binary

LangLangBart opened this issue · comments

Discussed in #2971


Issue

Currently, running a command like the following will print a warning:

  • Related Issue #823
printf "First\0" | bat -p
SCR-20240529-trdo

The warning is defined in src/printer.rs:

bat/src/printer.rs

Lines 435 to 444 in 8f8c953

if !self.config.style_components.header() {
if Some(ContentType::BINARY) == self.content_type && !self.config.show_nonprintable {
writeln!(
handle,
"{}: Binary content from {} will not be printed to the terminal \
(but will be present if the output of 'bat' is piped). You can use 'bat -A' \
to show the binary file contents.",
Yellow.paint("[bat warning]"),
input.description.summary(),
)?;

The decision to label the input as BINARY seems to be made in src/input.rs:

bat/src/input.rs

Lines 260 to 271 in 8f8c953

let mut first_line = vec![];
reader.read_until(b'\n', &mut first_line).ok();
let content_type = if first_line.is_empty() {
None
} else {
Some(content_inspector::inspect(&first_line[..]))
};
if content_type == Some(ContentType::UTF_16LE) {
reader.read_until(0x00, &mut first_line).ok();
}

A hacky workaround is to make the first line empty, use bat, and then remove the first line:

printf "\nFirst\0" | bat -p | sed '1d'

Proposed solution

A new flag that doesn't label content_type as BINARY when the first line ends with a NUL byte:

# naming the flag '--text' to align with 'grep/git diff'
printf "First\0" | bat -p --text

The crate 1 used to determine if content is binary states:

//! encoding). Note that **this analysis can fail**. For example, even if unlikely, UTF-8-encoded
//! text can legally contain NULL bytes. Conversely, some particular binary formats (like binary

Based on this, a --text flag would be very appropriate, similar to how grep and git diff have one as well.

printf "First\0" | grep 'First'
# grep: (standard input): binary file matches

printf "First\0" | grep --text 'First'
# First

Footnotes

  1. sharkdp/content_inspector: Fast inspection of binary buffers to guess/determine the type of content

Hi @LangLangBart, would your issue be fixed by adding the flag -A to show non printable characters?

This would result in the following output:

image

by adding the flag -A

Thanks for the suggestion. I failed to mention this in the issue report here and only described it in the linked discussion. For my use case, the -A/--show-all flag would not be adequate.

I try to colorize my zsh history and pipe it into fzf.

# zsh only, the '-N'  flag separates the array elements by `NUL`
print -rNC1 -- "${(@uv)history}" | bat -pl zsh | fzf --read0

My bad, I missed the discussion link.

So the issue you're having, is that when printing something (in this case a line from the history file), in case it has a null char in it, it will give an error.

It feels like a very nieche problem to have, but I think it could be fixed, as you said, adding a --read0 or --read-null-bytes flag.

I could work on this as I'm looking for my first contribution to the project, but it would be good to have an opinion from a more senior contributor too :)

I'm personally in favor of the idea, but it would be great to wait for input from some of the other maintainers before spending time on it, in case we don't all agree 😉

It feels like a very niche problem to have, but I think it could be fixed, as you said, adding a --read0 or --read-null-bytes flag.

I have updated the description, and I would propose a --text flag to align with grep and git diff.

wait for input from some of the other maintainers

Agreed, we should wait for input from some of the maintainers.

Sounds good to me. Let's think about making this an option, not a flag. Maybe there are other reasonable options that we want to add later (apart from a yes or no decision). Like whether or not we print that warning.

Project goals and alternatives
...
Be a drop-in replacement for (POSIX) cat

Question: Why was the binary message added at all ?

EDIT1: I found the reason in #248, and #336


Let's think about making this an option, not a flag.

How about this?
If the input was labeled binary, check if the first_line would also be labeled binary if the last char is not there, don't label the input as binary?

How about --input={text,auto,…} ?

great, so @LangLangBart you propose --input to specify which language to use for printing or did I get it wrong?

so basically (more or less)

  if input not set
     let mut first_line = vec![]; 
     reader.read_until(b'\n', &mut first_line).ok(); 
      
     let content_type = if first_line.is_empty() { 
         None 
     } else { 
         Some(content_inspector::inspect(&first_line[..])) 
     }; 
      
     if content_type == Some(ContentType::UTF_16LE) { 
         reader.read_until(0x00, &mut first_line).ok(); 
     }
  else
    content_type = get_content_type_from_input(input)
  endif

@domenicomastrangelo

@einfachIrgendwer0815 started already a PR.

Besides the color, it works well. Image below comparing 0.7.1 vs their PR.

@LangLangBart the color issue should be fixed now