graylog-labs / regex-library-shootout

Simple comparison benchmarks for various regex-based operations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Simplistic benchmark to get a grip on how fast or slow it is to check 100k random IPs against the roughly 3000 CIDR pattern long bogon list.

Five implementations are compared:

  • naive java.util.regex.Pattern
    • this loops over all compiled regexes and over all IPs to be checked
    • dead slow
  • one giant java.util.regex.Pattern
    • simply the alternation of all CIDR patterns
    • about an order of magnitude faster
  • one Hyperscan database of all CIDR patterns
    • another order of magnitude faster
    • two variants: with and without SINGLEMATCH option
  • convert CIDR patterns into BigInteger start/stop ranges
    • check numeric BigInteger ip in a RangeSet.contains() call
    • can still be optimized to use long for IPv4 addresses

On my machine (Dell XPS 9560, i7 2.8GHz, default JVM settings):

# Run complete. Total time: 00:11:29

Benchmark                      Mode  Cnt        Score        Error  Units
Hyperscan.regexMatchesMulti   thrpt    5   472497.086 ± 120152.258  ops/s
Hyperscan.regexMatchesSingle  thrpt    5   515623.166 ± 176015.019  ops/s
IpSubnetRanges.rangeSets      thrpt    5  2037209.457 ± 515291.444  ops/s
JavaUtilRegex.alternation     thrpt    5    89526.016 ±  35312.328  ops/s
JavaUtilRegex.iteration       thrpt    5     3515.186 ±     33.760  ops/s

About

Simple comparison benchmarks for various regex-based operations


Languages

Language:Java 100.0%