hey-red / Markdown

Open source C# implementation of Markdown processor, as featured on Stack Overflow.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Provide means of setting options programmatically

GoogleCodeExporter opened this issue · comments

I would love to see something like this:

var m = new Markdown(
    LinkEmails = true, 
    StrictBoldItalic = true
    (etc)
);

string s = m.Transform("blah blah");

This should be pretty easy to implement.  I suspect adding / organizing 
unit test coverage for this will be more work than actually implementing 
the functionality.

Assuming there aren't any overly negative reactions to this, I might put a 
patch together this week.

Original issue reported on code.google.com by kevin.ba...@gmail.com on 4 Jan 2010 at 2:41

I wanted this too and created a patch for it. The problems I saw were:

1. Constants used for options => No way to use this as a library without 
compiling
from source.

2. Needed to be able to set options from configuration file.

So what this patch does is the following:

1. New class, MarkdownOptions, which contains all the options that were 
previously
constants. The class is immutable, the options can be set explicitly at 
construction
time, or it can use the defaults that were used before, or use the defaults +
overrides from configuration files. The config options are placed in 
appSettings and
take the form Markdown.PropertyName, e.g. Markdown.AutoHyperlink. (Maybe later 
this
should use its own config section but that seemed overkill for now).

Having the options per instance creates some other changes as well. The regular
expressions that use the options are now per instance, not static anymore. The 
ones
that don't use any options are still static. 

2. Realistically for most cases you only need a single instance, and probably 
most of
the time you'd want defaults + config overrides. So there is a static property,
Default, on the Markdown class so you can just go Markdown.Default.Transform(s) 
if
that's all you want.

3. The Markdown class is immutable so if you want to change options later you 
must
create a new instance. This avoids having to keep track of option changes to
recompile regular expressions. Immutable == good :)

And that's about it. Added bin/obj to the ignore list as well.

I'll probably look at the failing unit tests later this week if I have a chance.

Original comment by hviturha...@gmail.com on 4 Jan 2010 at 11:26

Attachments:

Whoops, accidentally left some debugging code in there. Here's a clean version 
of the
patch. Sorry about that.

Original comment by hviturha...@gmail.com on 4 Jan 2010 at 12:19

Attachments:

I appreciate the contribution, but after applying the patch locally I am seeing 
a 15%
increase in benchmark times.

I assume this is because the affected regular expressions are not static?

I had always intended this to be used as a library, compiled from source code 
.. what
specific scenario did you have where you cannot do that?

Original comment by wump...@gmail.com on 5 Jan 2010 at 1:14

[deleted comment]
ouch -- on top of the 15% overall performance decline, the single call 
performance
got 10 times worse:

before patch:

input string length: 11075
1 iteration in 16 ms
input string length: 88607
1 iteration in 196 ms
input string length: 354431
1 iteration in 1635 ms


after patch:

1 iteration in 174 ms
input string length: 88607
1 iteration in 345 ms
input string length: 354431
1 iteration in 2000 ms

Original comment by wump...@gmail.com on 5 Jan 2010 at 1:27

Performance aside, I think this is valuable because it would allow people to 
use the 
library without needing to compile it.  We could conceivably distribute a .dl 
users link 
into their project instead.  This is a pretty standard model for distributing 
library 
code.

Second, and this is what actually triggered me to log this bug, is that we 
could 
actually test the functionality of the options flags.  Right now it's 
impossible to test 
whether they work without recompiling the code.  I would hate to have to 
recompile 
the code a bunch of times as part of an automated test suite.

All that said, a 15% performance degration is pretty serious.  But I think some 
of that 
may be an artifact of how the tests are executed (do the tests create many 
instances 
of the markdown class? do we expect users to?).

Original comment by kevin.ba...@gmail.com on 5 Jan 2010 at 3:10

I am not sure a dinky little markdown.cs justifies an entire DLL.. does it?

I understand the desire for configuration, but a 15% (and up to 10x, depending 
on how
you call it) performance penalty seems like an extremely steep cost for
configurability not everyone will need.

Original comment by wump...@gmail.com on 5 Jan 2010 at 3:31

Yes, I agree.

We should not do this unless we find a way to do it without sacrificing 
performance.

The .dll bit I'm not overly concerned about, but in order to write unit tests 
for the 
options functionality, we need this.

Original comment by kevin.ba...@gmail.com on 5 Jan 2010 at 3:38

The performance issue is weird, since the regular expressions themselves didn't
change at all, they only went from being static to instance variables. And they 
are
compiled in the constructor which is executed outside the benchmark itself. 

All in all I'm seeing a slightly worse performance on most tests (0.0x ms per
iteration) except for the really weard single call case. I'll look at that and 
see if
I can see what the issue is.

As for whether this makes sense or not, well, personally I like to pull in
dependencies for my projects as .dll's, not source code. But yes, maybe it's 
overkill
for a single Markdown.cs.

Original comment by hviturha...@gmail.com on 5 Jan 2010 at 8:08

Hmmm, after running some tests it seems that the single call performance is only
significantly better in the static case when the other benchmarks have been run
before. If I comment out the first three benchmarks (that run thousands of 
times) and
only run the last three (single iteration ones) I get these results:

WITH OPTIONS:
input string length: 11075
1 iteration in 134 ms

ORIGINAL:
input string length: 11075
1 iteration in 132 ms
DIFFERENCE: 1,52%
-----------------------------------------------

WITH OPTIONS:
input string length: 88607
1 iteration in 445 ms

ORIGINAL:
input string length: 88607
1 iteration in 399 ms
DIFFERENCE: 11,53%
-----------------------------------------------

WITH OPTIONS:
input string length: 354431
1 iteration in 4134 ms

ORIGINAL:
input string length: 354431
1 iteration in 4028 ms
DIFFERENCE: 2,63%
-----------------------------------------------

As far as I can tell this has something to do with how the Regex implementation
caches commonly used patterns, perhaps even compiling them behind the scenes? 
Most of
the regexes used are compiled but not all. 

Anyway, I can accept that the change is not worth it from a performance 
standpoint.
But I would still argue that if the class only has static members then the class
should be static as well. Creating many instances of a class that has no 
instance
data seems useless. And perhaps config options could be read in the static
initialization?



Original comment by hviturha...@gmail.com on 5 Jan 2010 at 12:06

[deleted comment]
let's try this again.. I made the entire class static, can we tackle the
configuration once more? new patch?

Original comment by wump...@gmail.com on 6 Jan 2010 at 1:21

ok, I reverted the all-static change -- as you pointed out, this isn't 
thread-safe

Original comment by wump...@gmail.com on 6 Jan 2010 at 11:19

Well, here is a new patch for this. The difference is that the configuration is
static, that is, you configure the options with a static method, the regular
expressions are all static so essentially every instance you create of the 
Markdown
class will use the same options. It still gives you the ability to read the 
options
from a config file, and reconfigure it later through code 
(Markdown.Configure(new
MarkdownOptions(...)), which will just recompile all the static regular 
expressions.

As for performance, it seems comparable with the older code. I ran the 
benchmarks a
few times, sometimes the newer is a bit slower, sometimes a bit faster. It's a 
couple
of milliseconds here and there.

Original comment by hviturha...@gmail.com on 7 Jan 2010 at 8:08

Attachments:

[deleted comment]
[deleted comment]
speed wise it looks OK (excellent!), but I'm not crazy about the way the 
regexes are
moved so far away from the code they are relevant to. 

It kind of utterly destroys the physical ordering and logical locality of the 
code..
and makes it a PITA to work on. Because when you're troubleshooting, say,
DoAutoLinks() .. and you want to check _autolinkBare, it's wayyyyyy up at the 
top of
the file. In a big jumble of code that has no relationship to each other.

It kind of stops being enjoyable to work on, for me, at that point.

Original comment by wump...@gmail.com on 7 Jan 2010 at 8:44

[deleted comment]
[deleted comment]
[deleted comment]
ok, I made changes in r88 that should move us closer -- we just need a way to 
rebuild
all the regular expressions when those properties change .. perhaps reflection 
or
something?

we think only .TabWidth will be problematic. (I made .NestingDepth private, as 
to me
that is 100% an implementation detail and should never REALLY need to be 
changed.. we
might even be able to use the proprietary .NET regex stack balancing extensions 
to
remove it entirely, eventually..)

The rest of the options, with the sole exception of .TabWidth, should work, as 
they
don't rely on being compiled into the regular expressions .. take a look and 
see what
you think?

Original comment by wump...@gmail.com on 7 Jan 2010 at 10:18

Well I would say the easiest way to get around that would be to just have a 
single
CompileXXX method for each of the regexes that need the options. Just have the 
method
be where the static initialization used to be, and then have one main
CompileRegularExpressions that calls all the others. Then you can just as 
easily look
at the regex when you need to.

Original comment by hviturha...@gmail.com on 7 Jan 2010 at 11:07

Ok, I just totally missed your point that everything except _tabwidth and 
_nestDepth
doesn't rely on the regexes being compiled. Then I'd just say leave those two as
constants and have the rest as properties, that's fine. _tabWidth is arguably 
more
appropriate as a constant anyway.

Original comment by hviturha...@gmail.com on 7 Jan 2010 at 11:28

yes, I'd say go ahead and adapt your cool load-config-from-file for this 
version -- I
think all the config properties should work with the exception of TabWidth

Original comment by wump...@gmail.com on 7 Jan 2010 at 12:49

Ok, I've created a new patch. Actually, since removing nestDepth and tabWidth 
from
the equation the whole thing just became a lot simpler. All the other config 
things
were only used in instance methods, so I could just make them normal properties 
not
static ones. So, you can create a new object, change its properties and do what 
you
want without affecting anything else.

Also added a constructor that takes in a bool loadOptionsFromConfigFile, the 
default
is still not to do it so no changes have to be made to existing tests or code.

And finally added unit tests to make sure the configuration is read from 
appsettings
and that changing the properties on an instance has the desired effect. 

No performance problems compared to a clean copy from SVN.

Original comment by hviturha...@gmail.com on 8 Jan 2010 at 8:46

Attachments:

[deleted comment]
ok checked in as r95

Original comment by wump...@gmail.com on 11 Jan 2010 at 2:10

  • Changed state: Fixed