dotnet / dotNext

Next generation API for .NET

Home Page:https://dotnet.github.io/dotNext/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Expand SequenceReaderExtensions with unsigned variants of TryReadLittleEndian and TryReadBigEndian

gsuberland opened this issue · comments

I've been using ReadOnlySequenceAccessor for parsing memory-mapped file structures. The System.Memory assembly contains a SequenceReaderExtensions class that provides methods for reading little-endian and big-endian integers. However, the extensions only include methods for accessing signed integers, whereas most of my code needs to use unsigned integers. It'd be helpful to have the unsigned variants implemented in dotnext.

This would be a fairly trivial thing to implement - essentially replicate the internal TryRead<T> method and duplicate the TryReadLittleEndian, TryReadBigEndian, and TryReadReverseEndianness methods for ushort, uint, and ulong. I did this in my code and can post a gist if it'd save some time, but it really is just a copy-paste job.

One additional potential enhancement I considered is adding array variants of this code, with a method signature similar to:

public static bool TryReadLittleEndian(ref this SequenceReader<byte> reader, ref uint[] values)

This would fill an existing allocated array.

However, I'm not sure of the best way to do this in a performant manner, given that the code would likely involve loops which would prevent inlining. I welcome your input on this.

commented

Hi @gsuberland , TryReadXXX for unsigned is trivial because signed and unsigned integers have the same binary representation. Therefore, all you need is to cast signed to unsigned integer after conversion, e.g.:

public static bool TryReadLittleEndian (this ref SequenceReader<byte> reader, out ushort value)
{
   bool result = SequenceReaderExtensions.TryReadLittleEndian(ref reader, out short sValue);
   value = (ushort)sValue;
   return result;
}

Or with Unsafe class:

public static bool TryReadLittleEndian (this ref SequenceReader<byte> reader, out ushort value)
{
    Unsafe.SkipInit(out value);
    return SequenceReaderExtensions.TryReadLittleEndian(ref reader, out Unsafe.As<ushort, short>(ref value));
}

Moreover, in DotNext.IO library you can find SequenceReader type that supports reading from ReadOnlySequence<byte> and has support for unsigned integers. This is why I see no reason for a new API.

Reading a vector of values with conversion according to byte order seems reasonable. But we don't need to use arrays for that. The better signature is:

public static bool TryReadLittleEndian(ref this SequenceReader<byte> reader, Span<uint> values);

Now the caller can choose arbitrary buffer for storing result: arrays, stack memory, or rented buffer. The implementation can be the following:

  • Fast path (if requested byte order is the same as on the executing CPU) - just reinterpret Span<uint> as Span<byte> and read the buffer from the sequence
  • Slow path (if requested byte order is different from the executing CPU) - use vectors and AVX intrinsics to transform the elements.
commented

@gsuberland , implementation is ready (see commit above). Tell me what you think. Personally, I don't like TryReadLittleEndian extension method for SequenceReader<byte> type. Instead, I would prefer to expose low-level TryRead to fetch an elements and store them to the provided buffer. This looks more reusable.

commented

The functionality is exposed through BinaryTransformations class since release 4.10.0.