buybackoff / Blake2Fast

Optimized BLAKE2 hashing implementations in C#

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NuGet Build Status Test Results CI NuGet

Blake2Fast

These RFC 7693-compliant BLAKE2 implementations have been tuned for high speed and low memory usage. The .NET Core builds use the new x86 SIMD Intrinsics for even greater speed. Span<byte> is used throughout for lower memory overhead compared to byte[] based APIs.

On .NET Core 2.1+, Blake2Fast includes SSE4.1 SIMD-accelerated implementations of both BLAKE2b and BLAKE2s.

On .NET Core 3+, a faster AVX2 implementation of BLAKE2b is included in addition to the SSE4.1 implementation.

Installation

Blake2Fast is available on NuGet

PM> Install-Package SauceControl.Blake2Fast

Usage

All-at-Once Hashing

The simplest way to calculate a hash is the all-at-once ComputeHash method.

var hash = Blake2b.ComputeHash(data);

BLAKE2 supports variable digest lengths from 1 to 32 bytes for BLAKE2s or 1 to 64 bytes for BLAKE2b.

var hash = Blake2b.ComputeHash(42, data);

BLAKE2 also natively supports keyed hashing.

var hash = Blake2b.ComputeHash(key, data);

Incremental Hashing

BLAKE2 hashes can be incrementally updated if you do not have the data available all at once.

async Task<byte[]> ComputeHashAsync(Stream data)
{
    var hasher = Blake2b.CreateIncrementalHasher();
    var buffer = ArrayPool<byte>.Shared.Rent(4096);

    int bytesRead;
    while ((bytesRead = await data.ReadAsync(buffer, 0, buffer.Length)) > 0)
        hasher.Update(buffer.AsSpan(0, bytesRead));

    ArrayPool<byte>.Shared.Return(buffer);
    return hasher.Finish();
}

For convenience, the generic Update<T>() method accepts any value type that does not contain reference fields, plus arrays and Spans of compatible types.

byte[] ComputeCompositeHash()
{
    var hasher = Blake2b.CreateIncrementalHasher();

    hasher.Update(42);
    hasher.Update(Math.Pi);
	hasher.Update("I love deadlines. I like the whooshing sound they make as they fly by.".AsSpan());

    return hasher.Finish();
}

Be aware that the value passed to Update is added to the hash state in its current memory layout, which may differ based on platform (endianness) or struct layout. Use care when calling Update with types other than byte if the computed hashes are to be used across application or machine boundaries.

For example, if you are adding a string to the hash state, you may hash the characters in memory layout as shown above, or you may use Encoding.GetBytes to ensure the string bytes are handled consistently across platforms.

Allocation-Free Hashing

The output hash digest can be written to an existing buffer to avoid allocating a new array each time.

Span<byte> buffer = stackalloc byte[Blake2b.DefaultDigestLength];
Blake2b.ComputeAndWriteHash(data, buffer);

This is especially useful when performing an iterative hash, as might be used in a key derivation function.

byte[] DeriveBytes(string password, ReadOnlySpan<byte> salt)
{
    // Create key from password, then hash the salt using the key
    var pwkey = Blake2b.ComputeHash(Encoding.UTF8.GetBytes(password));
    var hbuff = Blake2b.ComputeHash(pwkey, salt);

    // Hash the hash lots of times, re-using the same buffer
    for (int i = 0; i < 999_999; i++)
        Blake2b.ComputeAndWriteHash(pwkey, hbuff, hbuff);

    return hbuff;
}

System.Security.Cryptography Interop

For interoperating with code that uses System.Security.Cryptography primitives, Blake2Fast can create a HashAlgorithm wrapper. The wrapper inherits from HMAC in case keyed hashing is required.

HashAlgorithm is less efficient than the above methods, so use it only when necessary for compatibility.

byte[] WriteDataAndCalculateHash(byte[] data, string outFile)
{
    using (var hashAlg = Blake2b.CreateHashAlgorithm())
    using (var fileStream = new FileStream(outFile, FileMode.Create))
    using (var cryptoStream = new CryptoStream(fileStream, hashAlg, CryptoStreamMode.Write))
    {
        cryptoStream.Write(data, 0, data.Length);
        cryptoStream.FlushFinalBlock();
        return hashAlg.Hash;
    }
}

.NET Core 2.1 SIMD Intrinsics Warning

The x86 SIMD Intrinsics used in the .NET Core 2.1 build are not officially supported by Microsoft. Although the specific SSE Intrinsics used by Blake2Fast have been well-tested, the JIT support for the x86 Intrinsics in general is experimental in .NET Core 2.1.

If you are uncomfortable using unsupported functionality, you can make a custom build of Blake2Fast by removing the HWINTRINSICS define constant for the netcoreapp2.1 target in the project file.

This warning applies only to .NET Core 2.1. The SIMD Intrinsics are fully supported on .NET Core 3+.

Benchmarks

Sample results from the Blake2.Bench project. Benchmarks were run on the .NET Core 3.1 x64 runtime. Configuration below:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.836 (1909/November2018Update/19H2)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.301
  [Host]   : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
  ShortRun : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT

Job=ShortRun  IterationCount=3  LaunchCount=1  WarmupCount=3  

Blake2Fast vs .NET in-box algorithms (MD5 and SHA2)

|     Method | Data Length |            Mean |         Error |       StdDev |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------- |------------:|----------------:|--------------:|-------------:|-------:|------:|------:|----------:|
| BLAKE2-256 |           3 |        106.2 ns |       8.01 ns |      0.44 ns | 0.0134 |     - |     - |      56 B |
| BLAKE2-512 |           3 |        144.2 ns |      30.51 ns |      1.67 ns | 0.0210 |     - |     - |      88 B |
|        MD5 |           3 |        559.2 ns |      89.97 ns |      4.93 ns | 0.0496 |     - |     - |     208 B |
|    SHA-256 |           3 |        722.7 ns |      61.84 ns |      3.39 ns | 0.0572 |     - |     - |     240 B |
|    SHA-512 |           3 |        749.2 ns |      40.06 ns |      2.20 ns | 0.0725 |     - |     - |     304 B |
|            |             |                 |               |              |        |       |       |           |
| BLAKE2-256 |        3268 |      3,933.6 ns |     148.09 ns |      8.12 ns | 0.0076 |     - |     - |      56 B |
| BLAKE2-512 |        3268 |      2,429.7 ns |     107.58 ns |      5.90 ns | 0.0191 |     - |     - |      88 B |
|        MD5 |        3268 |      5,866.8 ns |     171.88 ns |      9.42 ns | 0.0458 |     - |     - |     208 B |
|    SHA-256 |        3268 |     12,719.1 ns |     559.17 ns |     30.65 ns | 0.0458 |     - |     - |     240 B |
|    SHA-512 |        3268 |      7,577.3 ns |     555.80 ns |     30.47 ns | 0.0610 |     - |     - |     304 B |
|            |             |                 |               |              |        |       |       |           |
| BLAKE2-256 |     3145728 |  3,667,519.1 ns |  77,804.44 ns |  4,264.72 ns |      - |     - |     - |      56 B |
| BLAKE2-512 |     3145728 |  2,240,879.0 ns | 101,729.66 ns |  5,576.15 ns |      - |     - |     - |      88 B |
|        MD5 |     3145728 |  5,108,604.6 ns | 189,941.46 ns | 10,411.33 ns |      - |     - |     - |     208 B |
|    SHA-256 |     3145728 | 11,038,065.4 ns | 311,623.07 ns | 17,081.11 ns |      - |     - |     - |     240 B |
|    SHA-512 |     3145728 |  6,599,771.6 ns | 251,528.85 ns | 13,787.15 ns |      - |     - |     - |     304 B |

Note that the built-in cryptographic hash algorithms in .NET Core forward to platform-native libraries for their implementations. On Windows, this means the implementations are provided by Windows CNG. Performance may differ on Linux.

On .NET Framework, only scalar (not SIMD) implementations are available for both BLAKE2 algorithms. The scalar implementations outperform the built-in .NET algorithms in 64-bit applications, but they are slower for large input data on 32-bit. The SIMD implementations available in .NET Core are faster than the built-in algorithms on either processor architecture.

Blake2Fast vs other BLAKE2b implementations available on NuGet

|              Method | Data Length |            Mean |            Error |          StdDev |     Gen 0 |     Gen 1 |     Gen 2 |   Allocated |
|-------------------- |------------:|----------------:|-----------------:|----------------:|----------:|----------:|----------:|------------:|
| *Blake2Fast.Blake2b |           3 |        139.5 ns |          2.71 ns |         0.15 ns |    0.0076 |         - |         - |        32 B |
|      Blake2Sharp(1) |           3 |        382.0 ns |         41.26 ns |         2.26 ns |    0.2065 |         - |         - |       864 B |
|      ByteTerrace(2) |           3 |        442.5 ns |         40.06 ns |         2.20 ns |    0.1087 |         - |         - |       456 B |
| S.D.HashFunction(3) |           3 |      1,818.6 ns |         28.93 ns |         1.59 ns |    0.4158 |         - |         - |      1744 B |
|        Konscious(4) |           3 |      1,234.3 ns |         23.67 ns |         1.30 ns |    0.2289 |         - |         - |       960 B |
|           Isopoh(5) |           3 | 10,403,770.2 ns | 96,909,560.25 ns | 5,311,940.00 ns | 1736.0840 | 1722.4121 | 1722.4121 | 527973075 B |
|       Blake2Core(6) |           3 |      1,407.4 ns |        137.05 ns |         7.51 ns |    0.2060 |         - |         - |       864 B |
|             NSec(7) |           3 |        170.2 ns |         17.42 ns |         0.96 ns |    0.0267 |         - |         - |       112 B |
|                     |             |                 |                  |                 |           |           |           |             |
| *Blake2Fast.Blake2b |        3268 |      2,413.4 ns |         48.19 ns |         2.64 ns |    0.0076 |         - |         - |        32 B |
|      Blake2Sharp(1) |        3268 |      4,378.4 ns |        278.87 ns |        15.29 ns |    0.2060 |         - |         - |       864 B |
|      ByteTerrace(2) |        3268 |      4,095.5 ns |        295.62 ns |        16.20 ns |    0.1068 |         - |         - |       456 B |
| S.D.HashFunction(3) |        3268 |     29,730.2 ns |      2,388.67 ns |       130.93 ns |    2.2278 |         - |         - |      9344 B |
|        Konscious(4) |        3268 |     16,682.2 ns |        997.62 ns |        54.68 ns |    0.2136 |         - |         - |       960 B |
|           Isopoh(5) |        3268 |  1,708,548.1 ns |  3,287,267.60 ns |   180,186.23 ns |  220.7031 |  218.7500 |  218.7500 |  67111641 B |
|       Blake2Core(6) |        3268 |     20,619.3 ns |      1,859.13 ns |       101.90 ns |    0.1831 |         - |         - |       864 B |
|             NSec(7) |        3268 |      2,459.1 ns |        252.85 ns |        13.86 ns |    0.0267 |         - |         - |       112 B |
|                     |             |                 |                  |                 |           |           |           |             |
| *Blake2Fast.Blake2b |     3145728 |  2,242,018.9 ns |    156,659.45 ns |     8,587.03 ns |         - |         - |         - |        32 B |
|      Blake2Sharp(1) |     3145728 |  3,955,138.2 ns |    113,166.53 ns |     6,203.04 ns |         - |         - |         - |       864 B |
|      ByteTerrace(2) |     3145728 |  3,641,689.8 ns |     58,221.45 ns |     3,191.31 ns |         - |         - |         - |       457 B |
| S.D.HashFunction(3) |     3145728 | 27,450,332.3 ns |  1,245,091.70 ns |    68,247.68 ns | 1781.2500 |         - |         - |   7472544 B |
|        Konscious(4) |     3145728 | 15,179,139.1 ns |    668,577.20 ns |    36,646.97 ns |         - |         - |         - |       960 B |
|           Isopoh(5) |     3145728 |  4,011,376.3 ns |    477,836.99 ns |    26,191.86 ns |         - |         - |         - |       984 B |
|       Blake2Core(6) |     3145728 | 18,704,691.7 ns |  1,247,107.98 ns |    68,358.20 ns |         - |         - |         - |       864 B |
|             NSec(7) |     3145728 |  2,247,392.2 ns |     13,390.91 ns |       734.00 ns |         - |         - |         - |       112 B |
  • (1) Blake2Sharp is the reference C# BLAKE2b implementation from the official BLAKE2 repo. This version is not published to NuGet, so the source is included in the benchmark project directly.
  • (2) ByteTerrace.Maths.Cryptography.Blake2 version 0.0.6.
  • (3) System.Data.HashFunction.Blake2 version 2.0.0. BLAKE2b only.
  • (4) Konscious.Security.Cryptography.Blake2 version 1.0.9. BLAKE2b only.
  • (5) Isopoh.Cryptography.Blake2b version 1.1.3. Yes, it really is that slow on incomplete block lengths.
  • (6) Blake2Core version 1.0.0. This package contains the reference Blake2Sharp code compiled as a debug (unoptimized) build. BenchmarkDotNet errors in such cases, so the settings were overridden to allow this library to run.
  • (7) NSec.Cryptography 20.2.0. This implementation of BLAKE2b is not RFC-compliant in that it does not support digest sizes less than 32 bytes or keyed hashing. NSec.Cryptography wraps the native libsodium library, which contains an AVX2 implementation of BLAKE2b.

Blake2Fast vs other BLAKE2s implementations available on NuGet

|              Method | Data Length |           Mean |         Error |       StdDev |  Gen 0 | Gen 1 | Gen 2 | Allocated |
|-------------------- |------------:|---------------:|--------------:|-------------:|-------:|------:|------:|----------:|
| *Blake2Fast.Blake2s |           3 |       106.5 ns |       2.30 ns |      0.13 ns | 0.0076 |     - |     - |      32 B |
|      Blake2s-net(1) |           3 |       274.4 ns |      39.08 ns |      2.14 ns | 0.1278 |     - |     - |     536 B |
|      ByteTerrace(2) |           3 |       303.6 ns |       5.69 ns |      0.31 ns | 0.0763 |     - |     - |     320 B |
|                     |             |                |               |              |        |       |       |           |
| *Blake2Fast.Blake2s |        3268 |     3,941.2 ns |     388.64 ns |     21.30 ns | 0.0076 |     - |     - |      32 B |
|      Blake2s-net(1) |        3268 |     6,044.0 ns |     251.18 ns |     13.77 ns | 0.1221 |     - |     - |     536 B |
|      ByteTerrace(2) |        3268 |     6,287.7 ns |     715.20 ns |     39.20 ns | 0.0763 |     - |     - |     320 B |
|                     |             |                |               |              |        |       |       |           |
| *Blake2Fast.Blake2s |     3145728 | 3,669,570.7 ns | 308,040.39 ns | 16,884.73 ns |      - |     - |     - |      32 B |
|      Blake2s-net(1) |     3145728 | 5,549,277.3 ns | 171,690.31 ns |  9,410.93 ns |      - |     - |     - |     536 B |
|      ByteTerrace(2) |     3145728 | 5,754,080.2 ns |  75,019.78 ns |  4,112.09 ns |      - |     - |     - |     320 B |
  • (1) blake2s-net version 0.1.0. This is a conversion of the reference Blake2Sharp code to support BLAKE2s.
  • (2) ByteTerrace.Maths.Cryptography.Blake2 version 0.0.6.

You can find more detailed comparisons between Blake2Fast and other .NET BLAKE2 implementations starting here. The short version is that Blake2Fast is the fastest and lowest-memory version of RFC-compliant BLAKE2 available for .NET.

About

Optimized BLAKE2 hashing implementations in C#

License:MIT License


Languages

Language:C# 100.0%