Crash in a malformed `<table>`, with "Bad state: No element"
gnprice opened this issue · comments
The following HTML causes the parser to crash:
\t<TABLE><<!>;<!><<!>.<lec><th>i><a><mat\x00\x01<mi\x00a><math>><th><mI>chardeta\xff\xff\xff\xff<><th><mI><||||||||A<select><>qu?\xbemath><th><mie>qu
Here's a stack trace (with package:html
0.15.4):
$ dart tmp.dart
Unhandled exception:
Bad state: No element
#0 ListBase.removeLast (dart:collection/list.dart:313:7)
#1 TreeBuilder.clearActiveFormattingElements (package:html/src/treebuilder.dart:217:42)
#2 InCellPhase.endTagTableCell (package:html/parser.dart:3172:12)
#3 InCellPhase.closeCell (package:html/parser.dart:3128:7)
#4 InCellPhase.startTagTableOther (package:html/parser.dart:3147:7)
#5 InCellPhase.processStartTag (package:html/parser.dart:3092:16)
#6 HtmlParser.mainLoop (package:html/parser.dart:310:37)
#7 HtmlParser._parse (package:html/parser.dart:191:9)
#8 HtmlParser.parseFragment (package:html/parser.dart:182:5)
From the stack, it looks like it's related to that table
element and/or the th
elements.
Here's a second HTML string causing the same crash (with the same stack trace):
y<framesetboheadrb$al>t<table><><t><th><math><th>u<\x0ch><mi><thx><TR>ind><<meta><i<isind<i\xff\xff\xff\xffex><select><<tr>i=ut\x00\x007>
I've included a self-contained test program at the end.
I obtained both of these test cases from html5lib/html5lib-python#568 , a bug report on the html5lib Python library which this library is described as a port of. They originate from Google's oss-fuzz project, as applied to the BeautifulSoup library (which uses html5lib
).
There are several other fuzzer-produced test cases in that html5lib bug report, but I tried each of them against package:html
, and these two are the only ones that crashed. The rest produced reasonable-looking output instead.
Test program:
import 'package:html/parser.dart';
void main() {
final html = '\t<TABLE><<!>;<!><<!>.<lec><th>i><a><mat\x00\x01<mi\x00a><math>><th><mI>chardeta\xff\xff\xff\xff<><th><mI><||||||||A<select><>qu?\xbemath><th><mie>qu';
// or: final html = r'y<framesetboheadrb$al>t<table><><t><th><math><th>u<\x0ch><mi><thx><TR>ind><<meta><i<isind<i\xff\xff\xff\xffex><select><<tr>i=ut\x00\x007>';
final fragment = HtmlParser(html, parseMeta: false).parseFragment();
print(fragment.nodes);
}