General cleanup / bug prevention

Question

General cleanup / bug prevention

jonasfj opened this issue a year ago · comments

Jonas Finnemann Jensen commented a year ago

Following #531 (infinite loop) which caused some hard to debug issues. We've been wondering if we could reduce the risk of infinite loops, since these are hard to recover from gracefully (and doing so eats a lot of CPU that affects a server negatively).

But in general, reducing the risk of bugs would be nice. Perhaps, we can use this topic to discuss possible cleanups / tweaks to prevent/reduce bugs.

Some ideas include:

#533
Remove BlockParser.canParse() in favor of BlockParser.parse() returning null
Return null from BlockParser.parse() whenever it doesn't advance _pos
Review if other while (!isDone) { loops could benefit from a limit on iterations?
Determine if we can avoid BlockParser.retreat and BlockParser.retreatBy
Other ideas...?

I'm not really sure how everything in this package works, maybe there is a need to retreat, and we can't just peek.

Zhiguang Chen · Answer 1 · Tue Apr 25 2023 21:34:22 GMT+0800 (China Standard Time)

I think we can avoid retread. We can parse each syntax in a isolated context and never run the same syntax at the same position if the parser returns null.

Jonas Finnemann Jensen · Answer 2 · Tue Apr 25 2023 22:23:17 GMT+0800 (China Standard Time)

We can parse each syntax in a isolated context and never run the same syntax at the same position if the parser returns null.

Wouldn't it be a lot safer to do this. I mean, we probably don't even have to create an isolate context for each run, we can probably just remove the calls to retreat and use some sort of peek mechanism instead (or simply do advance() later).

For example: TableSyntax uses parser.retreat() after calling TableSyntax._parseRow which calls parser.advance(). But we could probably just choose to not call advance() inside _parseRow, and just call advance() when we've decided not to return null.
Then there is never a need to retreat.

To get rid of canParse we'd probably also want to get rid of BlockSyntax.interruptedBy, but from what I can see, it's only used by ParagraphSyntax. So maybe it's possible to just do a hack like if (parser.blockSyntaxes.any((s) => s is SetextHeaderSyntax)) { ... check...

And maybe we'd have to do something about BlockSyntax.isAtBlockEnd which also uses canParse. I don't know if something smarter can be done, or if calling canParse for every syntax is a good idea.

@chenzhiguang do you have other ideas for things that might be worth while doing?

Zhiguang Chen · Answer 3 · Tue Apr 25 2023 23:02:00 GMT+0800 (China Standard Time)

@jonasfj Yes, like you said, we can also do not advance the position in the TableSyntax._parseRow, it means we isolate a segment of the source text and parse it independently of the main parser.

I have created a TextParser to simplify the task, you can see how it is implemented in the ListSyntax : https://github.com/dart-lang/markdown/blob/master/lib/src/block_syntaxes/list_syntax.dart#L156.

A TextParser instance takes a portion of the source text and runs it separately without changing the main reading position.

Jonas Finnemann Jensen · Answer 4 · Wed Apr 26 2023 16:01:45 GMT+0800 (China Standard Time)

Ah, so TextParser is a thing for reading a single line. Makes sense.

We could also look at reusing package string_scanner (or extending it), it already has support for source_span..

note. both source_span and string_scanner are in the Dart SDK.

Zhiguang Chen · Answer 5 · Wed Apr 26 2023 17:18:33 GMT+0800 (China Standard Time)

Ah, so TextParser is a thing for reading a single line. Makes sense.

The source to be parsed does not necessarily have to be a single line; it can be a string that contains multiple lines.

We could also look at reusing package string_scanner (or extending it), it already has support for source_span..

I have taken a quick look at this package, but I'm not sure about its performance. Normally, the Markdown parser simply walks through the text character by character, avoiding the use of patterns intentionally. Additionally, if we have implemented source_span, we will convert the source into SourceSpan from the very beginning. However, in each syntax, there are cases where the source to be parsed might not be a single SourceSpan instance; it could be a list of SourceSpans which locations are disjoint. For example, for this string # _a\_b_, for the inline syntaxes it hits the EscapeSyntax first. Then, the input for the EmphasisSyntax will be two disjoint SourceSpans containing _a and _b_, the string_scanner might not meet our requirements in this case in a simple way.

The TextParser is a simplified version of the SourceParser, the SourceParser works for a list of SourceSpans as the source input. You can find the SourceParser implementation here: https://github.com/tagnote-app/dart_markdown/blob/master/lib/src/parsers/source_parser.dart.