How to (code-)efficiently traverse a DOM?

Question

How to (code-)efficiently traverse a DOM?

therealprof opened this issue 7 years ago · comments

I do have a rather simple DOM which I'd like to traverse but the regular DOM implementation makes it rather tedious to actually navigate around in the parsed tree. To get to the root element of the document I'm using this unsightly code at the moment (there may or may not be a comment before the root element so I have to filter that):

    let root = doc.root()
        .children()
        .into_iter()
        .find(|&x|
            if let dom::ChildOfRoot::Element(_) = x {
                true
            } else {
                false
            }
        )
        .unwrap()
        .element()
        .unwrap();

The next level (of interest) is a <model> which I'm getting at like:

    let model = root.children()
        .into_iter()
        .find(|&x| {
            if let Some(name) = x.element() {
                name.name().local_part() == "model"
            } else {
                false
            }
        })
        .unwrap()
        .element()
        .unwrap();

and so on and so on.

It seems tinydom would provide more convenient access to the DOM but that looks unfinished and under-documented at the moment.

Is there a more elegant way to traverse the DOM, like a direct iterator over all children as elements directly so I can skip all the naughty element-ification and unwrapping?

Jake Goulding · Answer 1 · Tue Feb 28 2017 04:31:46 GMT+0800 (China Standard Time)

It sounds like you want an XPath:

extern crate sxd_xpath;

use sxd_xpath::{evaluate_xpath, Value};

fn main() {
    let value = evaluate_xpath(&doc, "/*/model").expect("XPath evaluation failed");
    if let Value::Nodeset(nodes) {
        // do something with the nodes.
    }
}

tinydom would provide more convenient access to the DOM

That's actually an experimental interface that should have the same capabilities as the traditional DOM interface but provide different compile-time tradeoffs. It's "underdocumented" in the sense that it's deliberately hidden from the docs 😉

Jake Goulding · Answer 2 · Tue Feb 28 2017 04:33:21 GMT+0800 (China Standard Time)

You'll also note that / and /* correspond to the Root type and the single Element child of the Root type that we are discussing in #40, which is part of the reason they are different types.

Daniel Egger · Answer 3 · Tue Feb 28 2017 15:49:29 GMT+0800 (China Standard Time)

I'm not sure using XPath is a key to success here. While it can be used to get to the subdocuments I need, the parsing of those subdocuments is going to be as onerous as it is getting to them right now. I see XPath more as a tool to search for or extract partial information from a document but I really need to consume all of it.