not-fl3 / nanoserde

Serialisation library with zero dependencies

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Variable uint used for container lengths (like string or vec)

narodnik opened this issue · comments

See the example code from: https://github.com/darkrenaissance/darkfi/blob/master/src/serial/src/lib.rs#L271

/// Variable-integer encoding.
/// Integer can be encoded depending on the represented value to save space.
/// Variable length integers always precede an array/vector of a type of data
/// that may vary in length. Longer numbers are encoded in little endian.
///
/// | Value         | Storage length | Format                              |
/// |---------------|----------------|-------------------------------------|
/// | <= 0xfc       | 1              | u8                                  |
/// | <= 0xffff     | 3              | `0xfd` followed by `value` as `u16` |
/// | <= 0xffffffff | 5              | `0xfe` followed by `value` as `u32` |
/// | -             | 9              | `0xff` followed by `value` as `u64` |
///
/// See also [Bitcoin variable length integers](https://en.bitcoin.it/wiki/Protocol_documentation#Variable_length_integer).
#[derive(Debug, PartialEq, Eq)]
pub struct VarInt(pub u64);

impl VarInt {
    /// Gets the length of this `VarInt` when encoded.
    /// Returns:
    /// * 1 for 0..0xFC
    /// * 3 for 0xFD..(2^16-1)
    /// * 5 for 0x10000..(2^32-1)
    /// * 9 otherwise
    #[inline]
    pub fn length(&self) -> usize {
        match self.0 {
            0..=0xFC => 1,
            0xFD..=0xFFFF => 3,
            0x10000..=0xFFFFFFFF => 5,
            _ => 9,
        }
    }
}

impl Encodable for VarInt {
    #[inline]
    fn encode<S: Write>(&self, mut s: S) -> Result<usize, Error> {
        match self.0 {
            0..=0xFC => {
                (self.0 as u8).encode(s)?;
                Ok(1)
            }

            0xFD..=0xFFFF => {
                s.write_u8(0xFD)?;
                (self.0 as u16).encode(s)?;
                Ok(3)
            }

            0x10000..=0xFFFFFFFF => {
                s.write_u8(0xFE)?;
                (self.0 as u32).encode(s)?;
                Ok(5)
            }

            _ => {
                s.write_u8(0xFF)?;
                self.0.encode(s)?;
                Ok(9)
            }
        }
    }
}

impl Decodable for VarInt {
    #[inline]
    fn decode<D: Read>(mut d: D) -> Result<Self, Error> {
        let n = ReadExt::read_u8(&mut d)?;
        match n {
            0xFF => {
                let x = ReadExt::read_u64(&mut d)?;
                if x < 0x100000000 {
                    return Err(Error::new(ErrorKind::Other, "Non-minimal VarInt"))
                }
                Ok(VarInt(x))
            }

            0xFE => {
                let x = ReadExt::read_u32(&mut d)?;
                if x < 0x10000 {
                    return Err(Error::new(ErrorKind::Other, "Non-minimal VarInt"))
                }
                Ok(VarInt(x as u64))
            }

            0xFD => {
                let x = ReadExt::read_u16(&mut d)?;
                if x < 0xFD {
                    return Err(Error::new(ErrorKind::Other, "Non-minimal VarInt"))
                }
                Ok(VarInt(x as u64))
            }

            n => Ok(VarInt(n as u64)),
        }
    }
}

Always using u32 or u64 for array lengths is wasteful since many containers only contain a byte's worth of items.

Varint encoding definitely would save some space for the binary format at least, but I'm a little concerned about the performance overhead all the additional branches would cause. I would be happy to at least consider and benchmark a PR adding this. (would also need to wait for a major release, as this would break reading of already-serialized objects)