wooorm / markdown-rs

CommonMark compliant markdown parser in Rust with ASTs and extensions

Home Page:https://docs.rs/markdown/1.0.0-alpha.17/markdown/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Internal panic with nested links

squili opened this issue · comments

commented

The strings
[![]()]()
![![]()]()
cause panics when passed to to_mdast!

I'm surprised this didn't get detected by fuzzing - you might want to double check the fuzzer

Thanks @squili!
Appreciate you taking some time to dig into this. 🙇
The the exception thrown appears to be:

thread 'test' panicked at 'internal error: entered unreachable code: expected footnote reference, image, or link on stack', src/to_mdast.rs:1275:14

_ => unreachable!("expected footnote refereence, image, or link on stack"),

Is this something you'd be interested in digging into more and exploring a fix for @squili?


I'm surprised this didn't get detected by fuzzing - you might want to double check the fuzzer

Ideas on how to improve the fuzzer are welcome!
Currently it finds one of #22, #23, #26, or #31 then gets stuck finding more variations on the same issue.

commented

I think the proper behavior here would be to return an error instead of panicking. Maybe some Results could be threaded further throughout to_mdast.rs so that these internal parsing errors just cause normal errors. It could also be good to go through each panic and add a comment explaining why it's actually unreachable!(), sort of like what is normally done with unsafe blocks.

On fuzzing, I think it might be good to have an automatic generator for random strings instead of coverage-based. While coverage-based is great for efficiency, the detected cases you linked are pretty short and the markdown parser itself is super fast, so a bunch of random ordering of tokens would probably pick some more issues up.

I think the proper behavior here would be to return an error instead of panicking.

I'm not sure an error would make sense here.
Looking at commonmark behavior https://spec.commonmark.org/dingus/?text=%5B!%5B%5D()%5D() this should produce a valid AST, the solution would be to trace the code path leading to the unreachable!() and adjust/implement the nesting case.

On fuzzing, I think it might be good to have an automatic generator for random strings instead of coverage-based. While coverage-based is great for efficiency, the detected cases you linked are pretty short and the markdown parser itself is super fast, so a bunch of random ordering of tokens would probably pick some more issues up.

Efficiency would be key here.
Fuzzing currently gets run by maintainers on a laptop before releases to spot check for errors.
Meaning it runs on commodity hardware and with a bounded/fixed amount time, not a super compute cluster with unbounded time.

Yes, unguided fuzzing can theoretically more panics given time and compute, but it uses A LOT of time and compute to do so.

Do you have a suggestion of where to find that level of time and compute?
I'm aware of https://google.github.io/oss-fuzz/, but even they use guided fuzzing.

Or do you have another idea which works with commodity compute and bounded time?

commented

Fixed, thanks to @sheremetyev!