hey-red / Markdown

Open source C# implementation of Markdown processor, as featured on Stack Overflow.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make All Static Regex's Readonly and Compiled

GoogleCodeExporter opened this issue · comments

I am submitting this patch which makes all Regex static statements readonly 
and compiled.

I changed all

private static Regex foo = new Regex("...", ...);

to

private static readonly Regex foo = new Regex("...", ... | 
RegexOptions.Compiled);

The numbers on the project home didn't match the numbers on my machine, but 
I wouldn't imagine that they would since they were done on a different 
machine.  

475 string length
I observed a 30% increase in time to complete the task when compiled, I 
think this has to do with the time it takes to compile on first hit of the 
Markdown class. Needs more testing.

2356 string length
I observed a 10% decrease in the time to complete the task when compiled.

27737 string length
I observed an 10.5% decrease in the time to complete the task when 
compiled.

I ran these numbers many times, and the numbers only slightly varied. 

Original issue reported on code.google.com by nberardi on 26 Dec 2009 at 8:31

Attachments:

As you noticed, it's a bad idea to blindly compile all the regexes.

I already made a pass with ANTS Performance profiler and compiled just the 
regexes
that the profiler told me were expensive. This provides the best blend of short 
and
long performance with almost no compromises.

That's the proper solution IMO -- just compile the ones that actually take the
longest. If you compile them all, short performance suffers too much. And short
performance is important; for example, the average Stack Overflow post size is 
just
over 1000 characters.

Original comment by wump...@gmail.com on 26 Dec 2009 at 9:30

  • Changed state: WontFix
Seems odd it would just be short strings. Let me test tonight and see if the 
Regex compile is sililiar to the 
asp.net load time where it only occurs on first hit. Seems like a logical jump 
since it is first hit in the tests. 
Unless you have tried this before. 

Original comment by nberardi on 26 Dec 2009 at 10:21

well, you could increase the # of short benchmark iterations to something like 
20k or
50k -- that would simulate lots of calls with initial compilation. Though, I 
suspect
the results will be similar. The time taken does go down about 0.10 ms per call 
when
I increase the short benchmark run size from 1k to 20k --

{{{
input string length: 475
performed 20000 iterations in 17125 (0.85625 ms per iteration)
input string length: 2356
performed 500 iterations in 1978 (3.956 ms per iteration)
input string length: 27737
performed 100 iterations in 5171 (51.71 ms per iteration)
}}}

Original comment by wump...@gmail.com on 28 Dec 2009 at 12:22

some comparisons of short performance, varying # of iterations...

{{{
input string length: 475
performed 1000 iterations in 977    (0.977 ms per iteration)

input string length: 475
performed 2000 iterations in 1886   (0.943 ms per iteration)

input string length: 475
performed 4000 iterations in 3578   (0.895 ms per iteration)

input string length: 475
performed 8000 iterations in 7058   (0.882 ms per iteration)

input string length: 475
performed 16000 iterations in 13899 (0.869 ms per iteration)

input string length: 475
performed 32000 iterations in 27995 (0.875 ms per iteration)
}}}

Original comment by wump...@gmail.com on 28 Dec 2009 at 12:28

[deleted comment]
I updated from the latest SVN changes and here are the numbers I am seeing on 
my 
machine now with the following patch:

patched:
{{{
input string length: 475
performed 1000 iterations in 1047 (1.047 ms per iteration)
input string length: 2356
performed 500 iterations in 2325 (4.65 ms per iteration)
input string length: 27737
performed 100 iterations in 6583 (65.83 ms per iteration)
}}}

un-patched:
{{{
input string length: 475
performed 1000 iterations in 1202 (1.202 ms per iteration)
input string length: 2356
performed 500 iterations in 2733 (5.466 ms per iteration)
input string length: 27737
performed 100 iterations in 7411 (74.11 ms per iteration)
}}}

Seems to be getting better performance numbers with the re-factored testing 
framework 
that you included.

Original comment by nberardi on 28 Dec 2009 at 3:50

Attachments:

agreed, I can verify it's faster for me too now!

Must have something to do with the bugfixes in the previous revisions.. I'll 
accept
this change since it is faster in all 3 benchmark test cases now.

Original comment by wump...@gmail.com on 28 Dec 2009 at 6:03

  • Changed state: Accepted
oh and here are my before and after benchmarks, too:

before:

input string length: 475
performed 1000 iterations in 847 (0.847 ms per iteration)
input string length: 2356
performed 500 iterations in 1956 (3.912 ms per iteration)
input string length: 27737
performed 100 iterations in 5184 (51.84 ms per iteration)

after:

input string length: 475
performed 1000 iterations in 682 (0.682 ms per iteration)
input string length: 2356
performed 500 iterations in 1617 (3.234 ms per iteration)
input string length: 27737
performed 100 iterations in 4428 (44.28 ms per iteration)

Original comment by wump...@gmail.com on 28 Dec 2009 at 6:04

Original comment by wump...@gmail.com on 28 Dec 2009 at 6:39

  • Changed state: Fixed