contribu / ruby_levenshtein_bench

Ruby Levenshtein Library Benchmark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ruby Levenshtein Benchmark

There are some ruby libraries to calculate levenshtein distance. Which library should we use?

Benchmark target libraries

Please note that require filename is conflicted so the bundler environment is separated.

Conslusion

Use damerau-levenshtein because it supports UTF-8 and moderately fast on short text and long text.

Gem Short Text Performance Long Text Performance UTF-8 Support
gem 'damerau-levenshtein' 1.673M 8.370k o
gem 'levenshtein' 98.976k 20.000 ?
gem 'levenshtein-ffi' 5.860M 1.716k x

Performance: Bigger is fast

Prepare

bundle install

Run

ruby run_all.rb

Results

run benchmark
Warming up --------------------------------------
damerau-levenshtein short text
                        28.847k i/100ms
damerau-levenshtein long text
                       135.000  i/100ms
Calculating -------------------------------------
damerau-levenshtein short text
                        345.211k (±16.5%) i/s -      1.673M in   5.010610s
damerau-levenshtein long text
                          1.708k (±16.5%) i/s -      8.370k in   5.061909s
Warming up --------------------------------------
levenshtein short text
                         2.062k i/100ms
levenshtein long text
                         1.000  i/100ms
Calculating -------------------------------------
levenshtein short text
                         20.006k (±13.4%) i/s -     98.976k in   5.093285s
levenshtein long text
                          3.817  (± 0.0%) i/s -     20.000  in   5.250539s
Warming up --------------------------------------
levenshtein-ffi short text
                        87.465k i/100ms
levenshtein-ffi long text
                        33.000  i/100ms
Calculating -------------------------------------
levenshtein-ffi short text
                          1.170M (± 2.5%) i/s -      5.860M in   5.010915s
levenshtein-ffi long text
                        340.802  (± 2.9%) i/s -      1.716k in   5.039697s

UTF-8 Support

damerau-levenshtein

The damerau-levenshtein gem allows to find edit distance between two UTF-8 or ASCII encoded strings with O(N*M) efficiency. https://github.com/GlobalNamesArchitecture/damerau-levenshtein#damerau-levenshtein

damerau-levenshtein supports UTF-8

levenshtein

We couldn't identify if levenshtein supports UTF-8.

levenshtein-ffi

The C extension uses char* strings, and so Unicode strings will give incorrect distances. https://github.com/dbalatero/levenshtein-ffi#known-issues

levenshtein-ffi doesn't support UTF-8

About

Ruby Levenshtein Library Benchmark


Languages

Language:Ruby 100.0%