XMLParse()

Question

XMLParse()

jocubeit opened this issue 5 years ago · comments

Any chance you'll be including XMLParse() functionality?

Budleigh Salterton · Answer 1 · Mon Dec 02 2019 21:52:30 GMT+0800 (China Standard Time)

I did think about this on a couple occasions - any proposals regarding a good streaming xml parser module that wouldn't include dependency weight that would incur relativistic effects? ;)

Dom Jocubeit · Answer 2 · Mon Dec 02 2019 22:38:34 GMT+0800 (China Standard Time)

fast-xml-parser seems to be the favourite, but it's a not a streaming parser.

xml-flow looks ok - I've never used it though. Uses the sax library. Also has a BSD license - not sure if this is a concern for you.

xml-streamer is based on node-expat and It implements the Node.js stream.Transform API. Not sure you want to add node-expat and lodash as dependencies though.

muxml is a Node.js transform stream, is written with ES6, has a dependency on sax and is MIT licensed. I'm liking the look of this one.

So maybe the last one? ;-)

Budleigh Salterton · Answer 3 · Mon Dec 02 2019 23:19:28 GMT+0800 (China Standard Time)

Looking at the options I'd just go and use sax as a dependency, as this would carry less weight...

Ok, the last time I inspected this I thought be best way would be to use xpath or queryselector as the argument for the data and I don't see anything that could solve this for us - which would mean sadly a new project (cool!). My idea would then be:

create a sax derived module that would do readable -> sax -> xpath -> readable stream
integrate that into scramjet.

Are you willing to help?

Dom Jocubeit · Answer 4 · Tue Dec 03 2019 05:30:17 GMT+0800 (China Standard Time)

I'm willing to help, not sure how much help I can be though... I come from the C# and ruby worlds, node and javascript are still on my learning curve. I'm using typescript, node and dart on a daily basis though, and I'm a fast learner. Just let me know what you would like me to tackle and I'll take a stab.

I'm thinking xpath will be challenging. We could avoid building a dom but still support querySelector by converting css queries to xpath. I've had a brief look at some xpath modules, but need to do more of a deep dive I think. Don't we have to parse the entire document to filter for an xpath?

I'm thinking we can use an xpath-like filter for matching nodes. I'm assuming each match will then be considered a "row" and be surfaced as an object by the .consume() method. Sorry if this is a little incoherent, I'm typing as I think.

Budleigh Salterton · Answer 5 · Thu Dec 05 2019 16:20:47 GMT+0800 (China Standard Time)

Good, so:

No, I think the XMLParse should not be a plugin especially since CSVParse isn't. I'd start a separate repo just for the sax based streamed parser and use it as a dependency (so that's where your help would be appreciated).
XPath and Selectors could be both supported - we'd just need to see which ones are supported by sax and other tools we could setup. I'm in a process of setting up slack for Scramjet - I'll invite you as soon as we get going.
Yes, "xpath-like filter", that's the idea I have, a bit like the JSONStream module that I use for JSONParse method.

Dom Jocubeit · Answer 6 · Sun Dec 08 2019 19:46:25 GMT+0800 (China Standard Time)

OK, let me know when you have slack setup, in the mean time I'll familiarise myself with ScramJet, JSONParse etc.

stale · Answer 7 · Thu Feb 06 2020 20:14:16 GMT+0800 (China Standard Time)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Budleigh Salterton · Answer 8 · Tue Feb 25 2020 20:40:26 GMT+0800 (China Standard Time)

There's a discussion on the just opened scramjet slack about this...

I think this could be the good way to work with this.

const flow = require("xml-flow");
const {DataStream} = require("scramjet");
DataStream.from(function() {
    const xml = flow('./path/to.xml')
    const out = new DataStream();
    xml.on("tag:mytag", data => out.write(data));
    out.on("pause", () => xml.pause())
    out.on("resume", () => xml.resume())
    return out;
})  
    .each(console.log)
    .catch(e => {
        console.error(e.stack);
        process.exit(1);
    });