Add Flag to Avoid Treating NUL Separated Input as Binary
LangLangBart opened this issue · comments
Discussed in #2971
Issue
Currently, running a command like the following will print a warning:
- Related Issue #823
printf "First\0" | bat -p
![SCR-20240529-trdo](https://private-user-images.githubusercontent.com/92653266/334961438-f07d4a7a-677d-4b53-b640-d092a7121d72.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NzE0NTAsIm5iZiI6MTcyMTQ3MTE1MCwicGF0aCI6Ii85MjY1MzI2Ni8zMzQ5NjE0MzgtZjA3ZDRhN2EtNjc3ZC00YjUzLWI2NDAtZDA5MmE3MTIxZDcyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIwVDEwMjU1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWZhNGVmZDI4MmRhZWZmZDY1NTAzNjI4Yzk2NjAwM2RmMGNiN2Y2M2Y3MDFjMTk0NjJmZTE2ODBhNzM4YTM5ZmUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.ljfWpdjghyLAO9MaaicC1PXEi_7ZsfEAyokQ598CDk0)
The warning is defined in src/printer.rs
:
Lines 435 to 444 in 8f8c953
The decision to label the input as BINARY
seems to be made in src/input.rs
:
Lines 260 to 271 in 8f8c953
A hacky workaround is to make the first line empty, use bat
, and then remove the first line:
printf "\nFirst\0" | bat -p | sed '1d'
Proposed solution
A new flag that doesn't label content_type
as BINARY
when the first line ends with a NUL
byte:
# naming the flag '--text' to align with 'grep/git diff'
printf "First\0" | bat -p --text
The crate
1 used to determine if content is binary
states:
//! encoding). Note that **this analysis can fail**. For example, even if unlikely, UTF-8-encoded
//! text can legally contain NULL bytes. Conversely, some particular binary formats (like binary
Based on this, a --text
flag would be very appropriate, similar to how grep
and git diff
have one as well.
printf "First\0" | grep 'First'
# grep: (standard input): binary file matches
printf "First\0" | grep --text 'First'
# First
Footnotes
Hi @LangLangBart, would your issue be fixed by adding the flag -A to show non printable characters?
This would result in the following output:
![image](https://private-user-images.githubusercontent.com/7526063/335105003-9c1f775c-b7d1-4634-b21f-ecb1e3dae4dd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NzE0NTAsIm5iZiI6MTcyMTQ3MTE1MCwicGF0aCI6Ii83NTI2MDYzLzMzNTEwNTAwMy05YzFmNzc1Yy1iN2QxLTQ2MzQtYjIxZi1lY2IxZTNkYWU0ZGQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcyMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MjBUMTAyNTUwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OWI5NWYyMzk5OWU2MDUwMzViODlmYjI3OGRlZWQzOGQ2MWU2M2IwZWU2YmEzZmEzNDU4ZGRlMzc2YzgxN2JiZCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.zLcf5m648sAWfyztC07lOetcr8PR6lcsEm8CSW11XgU)
by adding the flag -A
Thanks for the suggestion. I failed to mention this in the issue report here and only described it in the linked discussion. For my use case, the -A/--show-all
flag would not be adequate.
I try to colorize my zsh
history and pipe it into fzf
.
# zsh only, the '-N' flag separates the array elements by `NUL`
print -rNC1 -- "${(@uv)history}" | bat -pl zsh | fzf --read0
My bad, I missed the discussion link.
So the issue you're having, is that when printing something (in this case a line from the history file), in case it has a null char in it, it will give an error.
It feels like a very nieche problem to have, but I think it could be fixed, as you said, adding a --read0 or --read-null-bytes flag.
I could work on this as I'm looking for my first contribution to the project, but it would be good to have an opinion from a more senior contributor too :)
I'm personally in favor of the idea, but it would be great to wait for input from some of the other maintainers before spending time on it, in case we don't all agree 😉
It feels like a very niche problem to have, but I think it could be fixed, as you said, adding a --read0 or --read-null-bytes flag.
I have updated the description, and I would propose a --text
flag to align with grep
and git diff
.
wait for input from some of the other maintainers
Agreed, we should wait for input from some of the maintainers.
Sounds good to me. Let's think about making this an option, not a flag. Maybe there are other reasonable options that we want to add later (apart from a yes or no decision). Like whether or not we print that warning.
Project goals and alternatives
...
Be a drop-in replacement for (POSIX) cat
Question: Why was the binary message added at all ?
EDIT1: I found the reason in #248, and #336
![](https://private-user-images.githubusercontent.com/92653266/335480459-35905604-db8c-4712-a79c-313b1975d9a7.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NzE0NTAsIm5iZiI6MTcyMTQ3MTE1MCwicGF0aCI6Ii85MjY1MzI2Ni8zMzU0ODA0NTktMzU5MDU2MDQtZGI4Yy00NzEyLWE3OWMtMzEzYjE5NzVkOWE3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIwVDEwMjU1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTFiMDY0MzBiOGEwOTUwYjZlMTkwMDUzMDMyNmZlZTNkOTJhZGU1MzMzZDNiZDk0OWJkNzY1MzkwZTI2NTEzZmEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.vNc8u_B-eRbgtt3FgDp8u4uayNKDBSf7CnqeVzgyD24)
Let's think about making this an option, not a flag.
How about this?
If the input was labeled binary
, check if the first_line
would also be labeled binary if the last char is not there, don't label the input as binary
?
How about --input={text,auto,…}
?
great, so @LangLangBart you propose --input to specify which language to use for printing or did I get it wrong?
so basically (more or less)
if input not set
let mut first_line = vec![];
reader.read_until(b'\n', &mut first_line).ok();
let content_type = if first_line.is_empty() {
None
} else {
Some(content_inspector::inspect(&first_line[..]))
};
if content_type == Some(ContentType::UTF_16LE) {
reader.read_until(0x00, &mut first_line).ok();
}
else
content_type = get_content_type_from_input(input)
endif
@einfachIrgendwer0815 started already a PR.
Besides the color, it works well. Image below comparing 0.7.1
vs their PR.
![](https://private-user-images.githubusercontent.com/92653266/335733710-3b7a3f88-74ba-402f-8ce3-72ad67a41c23.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NzE0NTAsIm5iZiI6MTcyMTQ3MTE1MCwicGF0aCI6Ii85MjY1MzI2Ni8zMzU3MzM3MTAtM2I3YTNmODgtNzRiYS00MDJmLThjZTMtNzJhZDY3YTQxYzIzLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzIwVDEwMjU1MFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRmYWNhNzczY2VjMDEzZTQ5ZmJmZmUwNGUyY2ZjYmE1ODUwN2M1NmE4ZjllYjI2NWUwNTIyYzI0OGQ3ZTE0ZjYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.ckD5cgcUMvYmNyq7bVOHv0kKQxrZjwquBz8IbNi1TQo)
@LangLangBart the color issue should be fixed now
nice :)