Truncating behavior is confusing and forces allocations

Question

Truncating behavior is confusing and forces allocations

matklad opened this issue 3 years ago · comments

I expect the following test to pass:

#[test]
fn append() {
    let mut buf = "hello world".to_string();
    bs58::encode(&[92]).into(&mut buf).unwrap();
    assert_eq!("hello world2b", buf.as_str());
}

Instead, it fails, as the buf contains just "2b". That is, encoding discards existing data, rather than appending to it.

There are two problems with it:

it is surprising behavior. Standard library APIs like read_line always append. If overwriting is desired, the caller can call .clear()
it forces can force an allocation, if the user actually wants to append data to some existing buffer. This comes up when, for exmple, using sri-encoding hashes: "<algo-name>-<base58 encoded bytes>".

Nemo157 · Answer 1 · Wed Jul 07 2021 22:49:07 GMT+0800 (China Standard Time)

Agreed. Should be a pretty easy change and I think worth a breaking release.

Nemo157 · Answer 2 · Wed Jul 07 2021 23:04:10 GMT+0800 (China Standard Time)

Would you expect the same when decoding into a &mut Vec<u8> (vs &mut [u8])?

#[test]
fn append() {
    let mut buf = b"hello world".to_owned();
    bs58::decode("a").into(&mut buf).unwrap();
    assert_eq!(b"hello world!", buf.as_ref());
}

#[test]
fn no_append() {
    let mut buf = b"hello world".to_owned();
    bs58::decode("a").into(buf.as_mut()).unwrap();
    assert_eq!(b"!ello world", buf.as_ref());
}

Alex Kladov · Answer 3 · Wed Jul 07 2021 23:14:11 GMT+0800 (China Standard Time)

For Vec<u8>, I'd expect expect the same behavior as for String -- append the end.

For &mut [u8], I'd expect the same behavior as char::encode_utf8 -- overwrite the prefix, return the str slice of the data actually written.

Nemo157 · Answer 4 · Wed Jul 07 2021 23:17:19 GMT+0800 (China Standard Time)

Returning an &str would require checking/asserting utf-8 validity at that point, if you're doing more ASCII-only processing on the buffer (or never actually asserting it is a string) then you might want to delay that.

Alex Kladov · Answer 5 · Wed Jul 07 2021 23:23:17 GMT+0800 (China Standard Time)

Hm, I think base58 guarantees that the encoded result is utf8, so no additional validation is necessary? If this assumption is correct, that returning &mut str allows the calling code to avoid utf8-validation and bounds checking. In any case, returning just usize signifying the amount of bytes written would be fine as well. Maybe retuning usize is even better: I wager that the main benefit for char's return type is not actaully eliding the check, but just basic conveniecne for cases where you'd want to encode char to a local [u8; 4], and then do something with the resulting string.

Nemo157 · Answer 6 · Wed Jul 07 2021 23:24:55 GMT+0800 (China Standard Time)

Yeah, it wouldn't need validating since the API guarantees it's ASCII, but I want to minimize the unsafe code here (currently the only unsafe code used is the bare minimum necessary to actually work with &mut str).