dart-lang / html

Dart port of html5lib. For parsing HTML/HTML5 with Dart. Works in the client and on the server.

Home Page:https://pub.dev/packages/html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crash in a malformed `<table>`, with "Bad state: No element"

gnprice opened this issue · comments

The following HTML causes the parser to crash:
\t<TABLE><<!>;<!><<!>.<lec><th>i><a><mat\x00\x01<mi\x00a><math>><th><mI>chardeta\xff\xff\xff\xff<><th><mI><||||||||A<select><>qu?\xbemath><th><mie>qu

Here's a stack trace (with package:html 0.15.4):

$ dart tmp.dart
Unhandled exception:
Bad state: No element
#0      ListBase.removeLast (dart:collection/list.dart:313:7)
#1      TreeBuilder.clearActiveFormattingElements (package:html/src/treebuilder.dart:217:42)
#2      InCellPhase.endTagTableCell (package:html/parser.dart:3172:12)
#3      InCellPhase.closeCell (package:html/parser.dart:3128:7)
#4      InCellPhase.startTagTableOther (package:html/parser.dart:3147:7)
#5      InCellPhase.processStartTag (package:html/parser.dart:3092:16)
#6      HtmlParser.mainLoop (package:html/parser.dart:310:37)
#7      HtmlParser._parse (package:html/parser.dart:191:9)
#8      HtmlParser.parseFragment (package:html/parser.dart:182:5)

From the stack, it looks like it's related to that table element and/or the th elements.

Here's a second HTML string causing the same crash (with the same stack trace):
y<framesetboheadrb$al>t<table><><t><th><math><th>u<\x0ch><mi><thx><TR>ind><<meta><i<isind<i\xff\xff\xff\xffex><select><<tr>i=ut\x00\x007>

I've included a self-contained test program at the end.

I obtained both of these test cases from html5lib/html5lib-python#568 , a bug report on the html5lib Python library which this library is described as a port of. They originate from Google's oss-fuzz project, as applied to the BeautifulSoup library (which uses html5lib).

There are several other fuzzer-produced test cases in that html5lib bug report, but I tried each of them against package:html, and these two are the only ones that crashed. The rest produced reasonable-looking output instead.

Test program:

import 'package:html/parser.dart';

void main() {
  final html = '\t<TABLE><<!>;<!><<!>.<lec><th>i><a><mat\x00\x01<mi\x00a><math>><th><mI>chardeta\xff\xff\xff\xff<><th><mI><||||||||A<select><>qu?\xbemath><th><mie>qu';
  // or: final html = r'y<framesetboheadrb$al>t<table><><t><th><math><th>u<\x0ch><mi><thx><TR>ind><<meta><i<isind<i\xff\xff\xff\xffex><select><<tr>i=ut\x00\x007>';
  final fragment = HtmlParser(html, parseMeta: false).parseFragment();
  print(fragment.nodes);
}