dart-lang / markdown

A Dart markdown library

Home Page:https://pub.dev/packages/markdown

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

General cleanup / bug prevention

jonasfj opened this issue · comments

Following #531 (infinite loop) which caused some hard to debug issues. We've been wondering if we could reduce the risk of infinite loops, since these are hard to recover from gracefully (and doing so eats a lot of CPU that affects a server negatively).

But in general, reducing the risk of bugs would be nice. Perhaps, we can use this topic to discuss possible cleanups / tweaks to prevent/reduce bugs.

Some ideas include:

  • #533
  • Remove BlockParser.canParse() in favor of BlockParser.parse() returning null
  • Return null from BlockParser.parse() whenever it doesn't advance _pos
  • Review if other while (!isDone) { loops could benefit from a limit on iterations?
  • Determine if we can avoid BlockParser.retreat and BlockParser.retreatBy
  • Other ideas...?

I'm not really sure how everything in this package works, maybe there is a need to retreat, and we can't just peek.

I think we can avoid retread. We can parse each syntax in a isolated context and never run the same syntax at the same position if the parser returns null.

We can parse each syntax in a isolated context and never run the same syntax at the same position if the parser returns null.

Wouldn't it be a lot safer to do this. I mean, we probably don't even have to create an isolate context for each run, we can probably just remove the calls to retreat and use some sort of peek mechanism instead (or simply do advance() later).

For example: TableSyntax uses parser.retreat() after calling TableSyntax._parseRow which calls parser.advance(). But we could probably just choose to not call advance() inside _parseRow, and just call advance() when we've decided not to return null.
Then there is never a need to retreat.


To get rid of canParse we'd probably also want to get rid of BlockSyntax.interruptedBy, but from what I can see, it's only used by ParagraphSyntax. So maybe it's possible to just do a hack like if (parser.blockSyntaxes.any((s) => s is SetextHeaderSyntax)) { ... check...

And maybe we'd have to do something about BlockSyntax.isAtBlockEnd which also uses canParse. I don't know if something smarter can be done, or if calling canParse for every syntax is a good idea.


@chenzhiguang do you have other ideas for things that might be worth while doing?

@jonasfj Yes, like you said, we can also do not advance the position in the TableSyntax._parseRow, it means we isolate a segment of the source text and parse it independently of the main parser.

I have created a TextParser to simplify the task, you can see how it is implemented in the ListSyntax : https://github.com/dart-lang/markdown/blob/master/lib/src/block_syntaxes/list_syntax.dart#L156.

A TextParser instance takes a portion of the source text and runs it separately without changing the main reading position.

Ah, so TextParser is a thing for reading a single line. Makes sense.

We could also look at reusing package string_scanner (or extending it), it already has support for source_span..

note. both source_span and string_scanner are in the Dart SDK.

Ah, so TextParser is a thing for reading a single line. Makes sense.

The source to be parsed does not necessarily have to be a single line; it can be a string that contains multiple lines.

We could also look at reusing package string_scanner (or extending it), it already has support for source_span..

I have taken a quick look at this package, but I'm not sure about its performance. Normally, the Markdown parser simply walks through the text character by character, avoiding the use of patterns intentionally. Additionally, if we have implemented source_span, we will convert the source into SourceSpan from the very beginning. However, in each syntax, there are cases where the source to be parsed might not be a single SourceSpan instance; it could be a list of SourceSpans which locations are disjoint. For example, for this string # _a\_b_, for the inline syntaxes it hits the EscapeSyntax first. Then, the input for the EmphasisSyntax will be two disjoint SourceSpans containing _a and _b_, the string_scanner might not meet our requirements in this case in a simple way.

The TextParser is a simplified version of the SourceParser, the SourceParser works for a list of SourceSpans as the source input. You can find the SourceParser implementation here: https://github.com/tagnote-app/dart_markdown/blob/master/lib/src/parsers/source_parser.dart.