Make parser generic over sink
y21 opened this issue · comments
It would be nice if the parser was generic over a "sink" that gives users the ability to have a function called when a tag is visited (streaming parser). The sink could then decide what to do with the received tag.
Sometimes, one might not need to parse an entire HTML document, or other things that tl does by default.
We could provide default implementations, for example a sink that keeps track of ids and classes, and remembers them (in a map) so that ID lookups run in constant time (this already exists and can be enabled through ParserOptions::track_ids()
, but a sink could be nicer).
AFAICT parsers like html5ever seem to do this.
What do you think of adding a ParserOptions::skip_whitespace()
? I noticed when parsing there were quite a few Raw(Bytes("\n\n"))
that I'd like to ignore.
Or should I rather create a sink in that case?