virtualritz / gxhash

The fastest non-cryptographic hashing algorithm πŸ“ˆ. Passes SMHasher quality test suite βœ…

Home Page:https://docs.rs/gxhash

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GxHash

Build & Test

A blazingly fast and robust non-cryptographic hashing algorithm.

Usage

Directly as a hash function:

use gxhash::{gxhash32, gxhash64, gxhash128};

let bytes: &[u8] = "hello world".as_bytes();
let seed = 1234;

println!(" 32-bit hash: {:x}", gxhash::gxhash32(&bytes, seed));
println!(" 64-bit hash: {:x}", gxhash::gxhash64(&bytes, seed));
println!("128-bit hash: {:x}", gxhash::gxhash128(&bytes, seed));

GxHash provides an implementation of the Hasher trait. For convenience and interop with crates which require a std::collection::HashMap, the type aliases HashMap, HashSet are provided:

use gxhash::{HashMap, HashMapExt};

let mut map: HashMap<&str, i32> = HashMap::new();
map.insert("answer", 42);

Cargo Features

  • avx2 -- Enables AVX2 support for the gxhash128 and gxhash64 functions.

  • std -- Enables the HashMap/HashSet container convenience type aliases. This is on by default. Disable to make the crate no_std:

    [dependencies.gxhash]
    ...
    default-features = false

Features

Blazingly Fast πŸš€

As of this writing, GxHash is the fastest, non-cryptographic hashing algorithm of its class, for all input sizes. This performance is possible foremost due to heavy usage of SIMD intrinsics, high ILP construction and a small bytecode (easily inlined and cached).

See the benchmarks.

Highly Robust πŸ—Ώ

GxHash uses several rounds of hardware-accelerated AES block cipher for efficient bit mixing. Thanks to this, GxHash passes all SMHasher tests, which is the de facto quality benchmark for non-cryptographic hash functions, gathering most of the existing algorithms. GxHash has low collisions, uniform distribution and high avalanche properties.

Check out the paper for more technical details.

Portability

Supported Architectures

GxHash is compatible with:

  • x86 processors with AES-NI intrinsics.
  • ARM processors with NEON intrinsics.

⚠️ Warning

Other platforms are currently not supported (there is no fallback). Currently the crate does not build on these. If you add support for a new platform, a PR is highly welcome.

Stability of Hashes

All generated hashes for a given version of GxHash are stable. This means that for a given input the output hash will be the same across all supported platforms.

An exception to this is the AVX2 version of GxHash (requires a nightly toolchain).

Security

DOS Resistance

GxHash is a seeded hashing algorithm, meaning that depending on the seed used, it will generate completely different hashes. The default HasherBuilder (GxHasherBuilder::default()) uses seed randomization, making any HashMap/HashSet more DOS resistant, as it will make it much more difficult for attackers to be able to predict which hashes may collide without knowing the seed used. This does not mean however that it is completely DOS resistant. This has to be analyzed further.

Multicollisions Resistance

GxHash uses a 128-bit internal state (and even 256-bit with the avx2 feature). This makes GxHash a widepipe construction when generating hashes of size 64-bit or smaller. Which, among other useful properties, are inherently more resistant to multicollision attacks. See this paper for more details.

Cryptographic Properties

GxHash is a non-cryptographic hashing algorithm, thus it is not recommended to use it as a cryptographic algorithm (it is e.g. not a replacement for SHA). It has not been assessed if GxHash is preimage resistant and how difficult it is to be reversed.

Benchmarks

Benchmark

To run the benchmarks locally do one of the following:

# Benchmark throughput
cargo bench --bench throughput
# Benchmark performance of GxHash's Hasher when used in a HashSet
cargo bench --bench hashset
# Benchmark throughput and get output as a markdown table
cargo bench --bench throughput --features bench-md
# Benchmark throughput and get output as .svg plots
cargo bench --bench throughput --features bench-plot

GxHash is continuously benchmarked on X86 and ARM Github runners.

Lastest Benchmark Results:

aarch64 x86_64 x86_64-avx2

Contributing

  • Feel free to submit PRs
  • Repository is entirely usable via cargo commands
  • Versioning is the following
    • Major for stability breaking changes (output hashes for a same input are different after changes)
    • Minor for API changes/removal
    • Patch for new APIs, bug fixes and performance improvements

πŸ›ˆ Note

cargo-asm is an easy way to view the actual generated assembly code (cargo asm gxhash::gxhash::gxhash64). Note that #[inline] should be removed; otherwise the resp. method won't be seen by the tool.

πŸ›ˆ Note

AMD ΞΌProf gives some useful insights on per-instruction time spent.

Publication

Author's note:

I'm committed to the open dissemination of scientific knowledge. In an era where access to information is more democratized than ever, I believe that science should be freely available to all – both for consumption and contribution. Traditional scientific journals often involve significant financial costs, which can introduce biases and can shift the focus from purely scientific endeavors to what is currently trendy.

To counter this trend and to uphold the true spirit of research, I have chosen to share my work on "gxhash" directly on GitHub, ensuring that it's openly accessible to anyone interested. Additionally, the use of a free Zenodo DOI ensures that this research is citable and can be referenced in other works, just as traditional publications are.

I strongly believe in a world where science is not behind paywalls, and I am in for a more inclusive, unbiased, and open scientific community.

Publication: PDF

Cite this publication/algorithm: DOI

About

The fastest non-cryptographic hashing algorithm πŸ“ˆ. Passes SMHasher quality test suite βœ…

https://docs.rs/gxhash

License:MIT License


Languages

Language:Rust 96.6%Language:TeX 3.4%