y21 / tl

Fast, zero-copy HTML Parser written in Rust

Home Page:https://docs.rs/tl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Request: HTMLTag#children_mut

kelko opened this issue · comments

One can already change a lot of parts of a HTMLTag, but not (yet) the children. For a project of mine I would like to be able to exclude children from a HTML document and for this I would need such a feature.

I tried it out and a hacky version works:

impl<'a> HTMLTag<'a> {
// ...
    /// Returns a wrapper around the children of this HTML tag
    #[inline]
    pub fn children_mut(&mut self) -> ChildrenMut<'a, '_> {
        ChildrenMut(self)
    }
// ...
}

/// A thin wrapper around the children of [`HTMLTag`]
#[derive(Debug)]
pub struct ChildrenMut<'a, 'b>(&'b mut HTMLTag<'a>);

impl<'a, 'b> ChildrenMut<'a, 'b> {
    /// Returns the topmost, direct children of this tag as mutable reference.
    #[inline]
    pub fn top_mut(&mut self) -> &mut RawChildren {
        &mut self.0._children
    }
}

I'm not sure if there is a cleaner, nicer way to make a children & children_mut without the additional struct. If there is a known / preferred pattern I'm happy to try to implement it and make a PR. Just point me in the right direction

commented

Yes, I think that looks good. Only problem I can see is with implementing ChildrenMut::all_mut, if that's even needed. It would need a &mut Parser, as well as &self, which I think would be impossible to use. The &HTMLTag in ChildrenMut ultimately comes from &Parser and keeps an active immutable borrow, so you couldn't pass a &mut Parser to the function at the same time

For all_mut I would make the argument, that it not really makes sense. top() returns the vec of direct children, as it is stored in the Node. all() on the other dynamically generates an array / slice and returns that. So mutating that dynamically created slice wouldn't have the desired effect anyway.

But nevertheless I would like to ask, just to deepen my understanding of rust, whether that wouldn't be a point where interior mutability wouldn't be worth thinking of, instead of current "inherited mutability". If I understand correctly then even immutable borrows can provide (certain) mutating behaviour. In this case: Inside the Parser?

Or maybe even Rc<Cell> and reduce the amount of borrowing? (I understand of course this would reduce performance and increase memory usage.)

commented

Or maybe even Rc and reduce the amount of borrowing? (I understand of course this would reduce performance and increase memory usage.)

This crate used to use Rc<Node> for the nodes some time ago, but after some profiling and benchmarking it showed that it was a lot of overhead and I decided to experiment with the idea of a "global pool" (Vec) of nodes stored in the parser and have nodes store an index (usize) into this Vec instead of a reference or Rc<Node>.
This worked fine and reduced allocations by a lot; every creation of Rc<Node> was a heap allocation before whereas now pushing a Node into a vec only reallocates when its capacity is exceeded, and the algorithm Vec uses doubles its capacity after every reallocation, so there really aren't many heap allocations during parsing anymore.
It does mean however that you need a Parser to "resolve" the usize to a Node again. Also, the NodeHandle type (which is just a usize) can be freely cloned (much like Rc<Node>s), so this can help a lot with borrow patterns that would normally be impossible to express with the borrow checker.

I submitted a PR that adds ChildrenMut pretty much like you showed (along with a few other changes that I've been meaning to make some time ago and a fix for the innerHTML/outerHTML "bug") here: #44

Thanks, works perfectly for me 👍