[help] how to efficiently encode and decode custom int using unsafe
Chuckame opened this issue · comments
I'm currently maintaining a library (avro4k, that is apache avro format for kotlin) and I think that okio would be really powerful as a multiplatform streaming input and output.
I would like to encode int and long using zig-zag encoding, and I would like to have a performant encoding. As there is buffer segments, reading byte by byte can be done as a slow path.
The encoding is always in little endian, and each byte has its highest bit indicating if there is another byte to read or not, that ends up to 7 bits * n bytes
(4 bytes for int and 8 bytes for long).
Encoding:
Here is the int encoding (original code) , but you can guess that long encoding is similar.
public static int encodeInt(int n, byte[] buf, int pos) {
// move sign to low-order bit, and flip others if negative
n = (n << 1) ^ (n >> 31);
int start = pos;
if ((n & ~0x7F) != 0) {
buf[pos++] = (byte) ((n | 0x80) & 0xFF);
n >>>= 7;
if (n > 0x7F) {
buf[pos++] = (byte) ((n | 0x80) & 0xFF);
n >>>= 7;
if (n > 0x7F) {
buf[pos++] = (byte) ((n | 0x80) & 0xFF);
n >>>= 7;
if (n > 0x7F) {
buf[pos++] = (byte) ((n | 0x80) & 0xFF);
n >>>= 7;
}
}
}
}
buf[pos++] = (byte) n;
return pos - start;
}
Decoding:
Now the reverse, we read byte per byte (original code):
public int readInt() throws IOException {
int n = 0;
int b;
int shift = 0;
do {
b = in.read();
if (b >= 0) {
n |= (b & 0x7F) << shift;
if ((b & 0x80) == 0) {
return (n >>> 1) ^ -(n & 1); // back to two's-complement
}
} else {
throw new EOFException();
}
shift += 7;
} while (shift < 32);
throw new InvalidNumberEncodingException("Invalid int encoding");
}
Do you think this would be the most performant, or unsafe cursors could help to improve that?