jgm / commonmark-hs

Pure Haskell commonmark parsing library, designed to be flexible and extensible

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[fuzz result] unindented lines after footnote def are silently eaten

notriddle opened this issue · comments

Consider this markdown:

[^foo]:bar
baz

[^foo]

It's debatable whether baz should be part of the footnote or not, but it should certainly appear in the output somewhere. Right now, commonmark-extensions discards it.

In GitHub, it renders like this:

1

Footnotes

  1. bar
    baz

Yes, that should count as a lazy continuation.

I don't understand why the lazy line isn't being handled properly. Look at blockQuoteSpec for an analogy. This one should behave the same!

The disanalogy is actually in blockFinalize.
I wonder if that's the clue.

Evidence for this: if you change blockFinalize for footnote to defaultFinalizer, then the lazy line appears in a regular paragraph.

OK, I think I understand now what is happening.

In processLine, we close unmatched blocks
https://github.com/jgm/commonmark-hs/blob/master/commonmark/src/Commonmark/Blocks.hs#L138-L147
and in this case that would mean we close the footnote when we hit the unindented line. At this point the note's paragraph just contains "bar".

Laziness is handled at
https://github.com/jgm/commonmark-hs/blob/master/commonmark/src/Commonmark/Blocks.hs#L154-L159
Essentially, when we hit a lazy continuation, we simply add the closed blocks back to the node stack and proceed as before.

Normally, this works fine! But in the case of footnoteBlockSpec, it doesn't, because in this case blockFinalize has a side effect -- it updates the reference map with the note contents.

In fact, the reference map will be updated again after the re-added footnote is closed again (this time with "baz"). So at this point there will be two entries with key "foo." But lookupReference is set up to prefer the first entry in case of duplicates, so we get the defective one.

Solution? I'm not sure. It does seem non-ideal that in our treatment of lazy lines, we end up calling blockFinalize twice (or even more) for the same block, even though in most cases this is harmless. So, perhaps, instead of closing the blocks at line 138, we should wait until we actually detect a new block -- at that point we'd need to close the unmatched ones. If no new block is detected, then, unless we had a lazy line, we could close unmatched blocks, but if we had a lazy line, we'd still have the unmatched blocks open and wouldn't need to re-add them. This might even improve performance a bit when there are lots of lazy lines.