square / okio

A modern I/O library for Android, Java, and Kotlin Multiplatform.

Home Page:https://square.github.io/okio/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[help] how to efficiently encode and decode custom int using unsafe

Chuckame opened this issue · comments

I'm currently maintaining a library (avro4k, that is apache avro format for kotlin) and I think that okio would be really powerful as a multiplatform streaming input and output.

I would like to encode int and long using zig-zag encoding, and I would like to have a performant encoding. As there is buffer segments, reading byte by byte can be done as a slow path.

The encoding is always in little endian, and each byte has its highest bit indicating if there is another byte to read or not, that ends up to 7 bits * n bytes (4 bytes for int and 8 bytes for long).

Encoding:

Here is the int encoding (original code) , but you can guess that long encoding is similar.

public static int encodeInt(int n, byte[] buf, int pos) {
    // move sign to low-order bit, and flip others if negative
    n = (n << 1) ^ (n >> 31);
    int start = pos;
    if ((n & ~0x7F) != 0) {
      buf[pos++] = (byte) ((n | 0x80) & 0xFF);
      n >>>= 7;
      if (n > 0x7F) {
        buf[pos++] = (byte) ((n | 0x80) & 0xFF);
        n >>>= 7;
        if (n > 0x7F) {
          buf[pos++] = (byte) ((n | 0x80) & 0xFF);
          n >>>= 7;
          if (n > 0x7F) {
            buf[pos++] = (byte) ((n | 0x80) & 0xFF);
            n >>>= 7;
          }
        }
      }
    }
    buf[pos++] = (byte) n;
    return pos - start;
  }

Decoding:

Now the reverse, we read byte per byte (original code):

  public int readInt() throws IOException {
    int n = 0;
    int b;
    int shift = 0;
    do {
      b = in.read();
      if (b >= 0) {
        n |= (b & 0x7F) << shift;
        if ((b & 0x80) == 0) {
          return (n >>> 1) ^ -(n & 1); // back to two's-complement
        }
      } else {
        throw new EOFException();
      }
      shift += 7;
    } while (shift < 32);
    throw new InvalidNumberEncodingException("Invalid int encoding");

  }

Do you think this would be the most performant, or unsafe cursors could help to improve that?