Internal panic with nested links
squili opened this issue · comments
The strings
[![]()]()
![![]()]()
cause panics when passed to to_mdast
!
I'm surprised this didn't get detected by fuzzing - you might want to double check the fuzzer
Thanks @squili!
Appreciate you taking some time to dig into this. 🙇
The the exception thrown appears to be:
thread 'test' panicked at 'internal error: entered unreachable code: expected footnote reference, image, or link on stack', src/to_mdast.rs:1275:14
Line 1275 in c139008
Is this something you'd be interested in digging into more and exploring a fix for @squili?
I'm surprised this didn't get detected by fuzzing - you might want to double check the fuzzer
Ideas on how to improve the fuzzer are welcome!
Currently it finds one of #22, #23, #26, or #31 then gets stuck finding more variations on the same issue.
I think the proper behavior here would be to return an error instead of panicking. Maybe some Result
s could be threaded further throughout to_mdast.rs
so that these internal parsing errors just cause normal errors. It could also be good to go through each panic and add a comment explaining why it's actually unreachable!()
, sort of like what is normally done with unsafe
blocks.
On fuzzing, I think it might be good to have an automatic generator for random strings instead of coverage-based. While coverage-based is great for efficiency, the detected cases you linked are pretty short and the markdown parser itself is super fast, so a bunch of random ordering of tokens would probably pick some more issues up.
I think the proper behavior here would be to return an error instead of panicking.
I'm not sure an error would make sense here.
Looking at commonmark behavior https://spec.commonmark.org/dingus/?text=%5B!%5B%5D()%5D() this should produce a valid AST, the solution would be to trace the code path leading to the unreachable!()
and adjust/implement the nesting case.
On fuzzing, I think it might be good to have an automatic generator for random strings instead of coverage-based. While coverage-based is great for efficiency, the detected cases you linked are pretty short and the markdown parser itself is super fast, so a bunch of random ordering of tokens would probably pick some more issues up.
Efficiency would be key here.
Fuzzing currently gets run by maintainers on a laptop before releases to spot check for errors.
Meaning it runs on commodity hardware and with a bounded/fixed amount time, not a super compute cluster with unbounded time.
Yes, unguided fuzzing can theoretically more panics given time and compute, but it uses A LOT of time and compute to do so.
Do you have a suggestion of where to find that level of time and compute?
I'm aware of https://google.github.io/oss-fuzz/, but even they use guided fuzzing.
Or do you have another idea which works with commodity compute and bounded time?
Fixed, thanks to @sheremetyev!