ehzuf / RecursiveRegexpRaptor-vs-Benchmarks

the Raptor vs the world

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance comparison of regular expression engines

lang (es)

the original test environment by dark100 at http://sljit.sourceforge.net/regex_perf.html

Introduction

Processing text or raw byte-sequence are among the common tasks performed by most software tools. These tasks usually involve pattern matching algorithms, and the most popular tool for such purpose is regular expressions. Regular expressions has been evolved a lot since Kleene defined the regular sets in the 1950’s. Today we have several widely used regular expression engines which have different features which makes any performance comparison a difficult task, since a faster engine is not necessary better. Depending on the use case it might be enough to know whether a POSIX compatible regular expression matches to a line, even the position of the match is unneeded (grep utility). On the contrary other use cases require the position of capturing brackets, unicode support, conditional and atomic block (handling a byte sequence as a single character, like ‘sch’ in German language) support just to name a few. The former case needs a less sophisticated algorithm, which is likely be much faster than the latter, but again, that does not mean the former is better. More about these engine types can be found here.

Participants

The following popular engines were choosen:

and

Before anyone jump to any conclusions, I should note the followings:

  • The engines were not fine tuned (because of my lack of knowledge about their internal workings). I just compiled them with the default options. I know enabling or disabling some features can heavily affect the results. If you feel that you have a better configuration just drop me an e-mail and I will update the results (mailto:nasciiboy@gmail.com).
  • The regular expression engines are compiled with -O3 to allow the best performance.
  • This comparison page was inspired by the work of John Maddock (See his own regex comparison here). The input is also the same he used before: mtent12.zip. It is a text file (e-book) which size is about 20 Mbytes.
  • Only common patterns are selected, they are not pathological cases nor have any PERL specific features. The comparison was caseful.

Results

x86-64 bit Intel Cerelon 847 1.1GHz (GCC 6.2.1, GNU/Linux)

Regular expressionPCREPCRE
-DFA
TREOnig-
uruma
RE2PCRE
-JIT
regexp3regexp4
.|\n
.
5638 ms (20045118)5271 ms (20045118)6410 ms (20045118)13295 ms (20045118)10510 ms (20045118)1088 ms (20045118)1782 ms (20045118)826 ms (20045118)
\w
:w
2808 ms (14751878)3081 ms (14751878)4567 ms (14751878)10382 ms (14751878)7800 ms (14751878)937 ms (14751878)1884 ms (14750958)987 ms (14750958)
\d
:d
65 ms (27084)67 ms (27084)1031 ms (27084)131 ms (27084)141 ms (27084)57 ms (27084)1766 ms (27084)608 ms (27084)
\S
:S
2905 ms (15451664)3181 ms (15451664)4562 ms (15451664)10281 ms (15451664)8169 ms (15451664)908 ms (15451664)1894 ms (15451664)968 ms (15451664)
\S+
:S+
882 ms (3414592)1582 ms (3414592)2467 ms (3414592)3115 ms (3414592)2140 ms (3414592)317 ms (3414592)1065 ms (3414592)697 ms (3414592)
[a-zA-Z]+
[a-zA-Z]+
976 ms (3495761)1560 ms (3495761)2326 ms (3495761)3090 ms (3495761)2212 ms (3495761)331 ms (3495761)3026 ms (3495761)1090 ms (3495761)
[.\s]+
[:.:s]+
927 ms (3430783)1057 ms (3430783)1866 ms (991813)2641 ms (3430783)2192 ms (3430783)374 ms (3430783)4065 ms (3430783)1469 ms (3430783)
([^\n]+)
<[^\n]+>
313 ms (314387)1175 ms (314387)1547 ms (314387)823 ms (314387)468 ms (314387)88 ms (314387)1530 ms (314387)534 ms (314387)
e
e
349 ms (1781425)429 ms (1781425)487 ms (1781425)1388 ms (1781425)1006 ms (1781425)133 ms (1781425)1784 ms (1781425)710 ms (1781425)
(((((e)))))
<<<<<e>>>>>
1217 ms (1781425)1083 ms (1781425)487 ms (1781425)1972 ms (1781425)1010 ms (1781425)203 ms (1781425)26001 ms (1781425)3387 ms (1781425)
((((((((((e))))))))))
<<<<<<<<<<e>>>>>>>>>>
1926 ms (1781425)1670 ms (1781425)487 ms (1781425)2140 ms (1781425)995 ms (1781425)299 ms (1781425)83247 ms (1781425)4975 ms (1781425)
Twain
Twain
10 ms (2388)47 ms (2388)991 ms (2388)53 ms (2388)8 ms (2388)50 ms (2388)2361 ms (2388)628 ms (2388)
(Twain)
<Twain>
14 ms (2388)48 ms (2388)987 ms (2388)53 ms (2388)8 ms (2388)50 ms (2388)6999 ms (2388)995 ms (2388)
(?i)Twain
#*Twain
196 ms (2657)286 ms (2657)1283 ms (2657)337 ms (2657)196 ms (2657)52 ms (2657)2478 ms (2657)710 ms (2657)
((T|t)([wW])(a|A)i?I?([nN]))
<<T|t><[wW]><a|A>i?I?<[nN]>>
584 ms (2658)579 ms (2658)1802 ms (2658)353 ms (2658)174 ms (2658)77 ms (2658)24140 ms (2658)2454 ms (2658)
(T+([w]?(a{1}(i+(n*))))){1}
<T+<[w]?<a{1}<i+<n*>>>>>{1}
25 ms (2419)58 ms (2419)1172 ms (2419)179 ms (2419)8 ms (2419)7 ms (2419)20947 ms (2419)1002 ms (2419)
(?:T+(?:[w]?(?:a{1}(?:i+(?:n*))))){1}
(T+([w]?(a{1}(i+(n*))))){1}
20 ms (2419)58 ms (2419)1151 ms (2419)177 ms (2419)8 ms (2419)7 ms (2419)20989 ms (2419)856 ms (2419)
[a-z]shing
[a-z]shing
1495 ms (1877)2300 ms (1877)1547 ms (1877)50 ms (1877)286 ms (1877)48 ms (1877)5072 ms (1877)1472 ms (1877)
Huck[a-zA-Z]+|Saw[a-zA-Z]+
Huck[a-zA-Z]+|Saw[a-zA-Z]+
71 ms (396)75 ms (396)1533 ms (396)121 ms (396)133 ms (396)8 ms (396)6046 ms (396)2718 ms (396)
[a-q][^u-z]{13}x
[a-q][^u-z]{13}x
1752 ms (4929)6401 ms (4929)4421 ms (4929)175 ms (4929)566 ms (4929)5 ms (4929)11942 ms (4929)4177 ms (4929)
Tom|Sawyer|Huckleberry|Finn
Tom|Sawyer|Huckleberry|Finn
96 ms (3015)99 ms (3015)2677 ms (3015)143 ms (3015)139 ms (3015)84 ms (3015)11021 ms (3015)3905 ms (3015)
(Tom|Sawyer|Huckleberry|Finn)
<Tom|Sawyer|Huckleberry|Finn>
101 ms (3015)102 ms (3015)2663 ms (3015)144 ms (3015)137 ms (3015)82 ms (3015)26380 ms (3015)3148 ms (3015)
[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]
[hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo][hHeELlOo]
626 ms (534)880 ms (534)2746 ms (534)704 ms (534)268 ms (534)241 ms (534)9229 ms (534)1530 ms (534)
Tom.{10,25}river|river.{10,25}Tom
Tom([^(river|\n)]){10,25}river|river([^(Tom|\n)]){10,25}Tom
Tom(river|\n){10,25}#!river|river(Tom|\n){10,25}#!Tom
206 ms (2)243 ms (2)1758 ms (2)237 ms (2)155 ms (2)45 ms (2)13223 ms (2)2744 ms (2)
ing[^a-zA-Z]
ing[^a-zA-Z]
136 ms (85956)236 ms (85956)1110 ms (85956)139 ms (85956)109 ms (85956)54 ms (85956)2727 ms (85956)667 ms (85956)
[a-zA-Z]ing[^a-zA-Z]
[a-zA-Z]ing[^a-zA-Z]
1538 ms (85823)2367 ms (85823)1816 ms (85823)142 ms (85823)324 ms (85823)57 ms (85823)6145 ms (85823)1574 ms (85823)
([a-zA-Z]+ing)
<([^(ing|:A)])+ing(([^(ing|:A)])*ing)*>
<(ing|:A)+#!ing(((ing|:A)*#!ing)*>
4111 ms (95863)5474 ms (95863)2064 ms (95863)3152 ms (95863)335 ms (95863)223 ms (95863)53539 ms (95863)6868 ms (95863)

Just Donwload, type make. and run runtest.

About

the Raptor vs the world


Languages

Language:C 82.7%Language:C++ 17.3%Language:Makefile 0.1%