Variable uint used for container lengths (like string or vec)
narodnik opened this issue · comments
See the example code from: https://github.com/darkrenaissance/darkfi/blob/master/src/serial/src/lib.rs#L271
/// Variable-integer encoding.
/// Integer can be encoded depending on the represented value to save space.
/// Variable length integers always precede an array/vector of a type of data
/// that may vary in length. Longer numbers are encoded in little endian.
///
/// | Value | Storage length | Format |
/// |---------------|----------------|-------------------------------------|
/// | <= 0xfc | 1 | u8 |
/// | <= 0xffff | 3 | `0xfd` followed by `value` as `u16` |
/// | <= 0xffffffff | 5 | `0xfe` followed by `value` as `u32` |
/// | - | 9 | `0xff` followed by `value` as `u64` |
///
/// See also [Bitcoin variable length integers](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer).
#[derive(Debug, PartialEq, Eq)]
pub struct VarInt(pub u64);
impl VarInt {
/// Gets the length of this `VarInt` when encoded.
/// Returns:
/// * 1 for 0..0xFC
/// * 3 for 0xFD..(2^16-1)
/// * 5 for 0x10000..(2^32-1)
/// * 9 otherwise
#[inline]
pub fn length(&self) -> usize {
match self.0 {
0..=0xFC => 1,
0xFD..=0xFFFF => 3,
0x10000..=0xFFFFFFFF => 5,
_ => 9,
}
}
}
impl Encodable for VarInt {
#[inline]
fn encode<S: Write>(&self, mut s: S) -> Result<usize, Error> {
match self.0 {
0..=0xFC => {
(self.0 as u8).encode(s)?;
Ok(1)
}
0xFD..=0xFFFF => {
s.write_u8(0xFD)?;
(self.0 as u16).encode(s)?;
Ok(3)
}
0x10000..=0xFFFFFFFF => {
s.write_u8(0xFE)?;
(self.0 as u32).encode(s)?;
Ok(5)
}
_ => {
s.write_u8(0xFF)?;
self.0.encode(s)?;
Ok(9)
}
}
}
}
impl Decodable for VarInt {
#[inline]
fn decode<D: Read>(mut d: D) -> Result<Self, Error> {
let n = ReadExt::read_u8(&mut d)?;
match n {
0xFF => {
let x = ReadExt::read_u64(&mut d)?;
if x < 0x100000000 {
return Err(Error::new(ErrorKind::Other, "Non-minimal VarInt"))
}
Ok(VarInt(x))
}
0xFE => {
let x = ReadExt::read_u32(&mut d)?;
if x < 0x10000 {
return Err(Error::new(ErrorKind::Other, "Non-minimal VarInt"))
}
Ok(VarInt(x as u64))
}
0xFD => {
let x = ReadExt::read_u16(&mut d)?;
if x < 0xFD {
return Err(Error::new(ErrorKind::Other, "Non-minimal VarInt"))
}
Ok(VarInt(x as u64))
}
n => Ok(VarInt(n as u64)),
}
}
}
Always using u32
or u64
for array lengths is wasteful since many containers only contain a byte's worth of items.
Varint encoding definitely would save some space for the binary format at least, but I'm a little concerned about the performance overhead all the additional branches would cause. I would be happy to at least consider and benchmark a PR adding this. (would also need to wait for a major release, as this would break reading of already-serialized objects)