y21 / tl

Fast, zero-copy HTML Parser written in Rust

Home Page:https://docs.rs/tl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make parser generic over sink

y21 opened this issue · comments

commented

It would be nice if the parser was generic over a "sink" that gives users the ability to have a function called when a tag is visited (streaming parser). The sink could then decide what to do with the received tag.
Sometimes, one might not need to parse an entire HTML document, or other things that tl does by default.
We could provide default implementations, for example a sink that keeps track of ids and classes, and remembers them (in a map) so that ID lookups run in constant time (this already exists and can be enabled through ParserOptions::track_ids(), but a sink could be nicer).
AFAICT parsers like html5ever seem to do this.

What do you think of adding a ParserOptions::skip_whitespace()? I noticed when parsing there were quite a few Raw(Bytes("\n\n")) that I'd like to ignore.

Or should I rather create a sink in that case?