shepmaster / sxd-document

An XML library in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsing an external DTD fails

onelson opened this issue · comments

Sorry for the lengthy blob of xml, but I'm sort of at a loss for why this might be failing:

#[cfg(test)]
mod tests {
    use self::sxd_document::parser;

    #[test]
    fn test_parse_sample() {
        let xml = r##"<?xml version="1.0"?>
<!DOCTYPE cXML SYSTEM "http://xml.cxml.org/schemas/cXML/1.2.014/cXML.dtd">
<cXML xml:lang="en-US"
      payloadID="933695160894"
      timestamp="2002-08-15T08:47:00-07:00">
    <Header>
        <From>
            <Credential domain="DUNS">
                <Identity>83528721</Identity>
            </Credential>
        </From>
        <To>
            <Credential domain="DUNS">
                <Identity>65652314</Identity>
            </Credential>
        </To>
        <Sender>
            <Credential domain="workchairs.com">
                <Identity>website 1</Identity>
            </Credential>
            <UserAgent>Workchairs cXML Application</UserAgent>
        </Sender>
    </Header>
    <Message>
        <PunchOutOrderMessage>
            <BuyerCookie>1CX3L4843PPZO</BuyerCookie>
            <PunchOutOrderMessageHeader operationAllowed="edit">
                <Total>
                    <Money currency="USD">763.20</Money>
                </Total>
            </PunchOutOrderMessageHeader>
            <ItemIn quantity="3">
                <ItemID>
                    <SupplierPartID>5555</SupplierPartID>
                    <SupplierPartAuxiliaryID>E000028901</SupplierPartAuxiliaryID>
                </ItemID>
                <ItemDetail>
                    <UnitPrice>
                        <Money currency="USD">763.20</Money>
                    </UnitPrice>
                    <Description xml:lang="en">
                        <ShortName>Excelsior Desk Chair</ShortName>
                        Leather Reclining Desk Chair with Padded Arms
                    </Description>
                    <UnitOfMeasure>EA</UnitOfMeasure>
                    <Classification domain="UNSPSC">5136030000</Classification>
                    <LeadTime>12</LeadTime>
                </ItemDetail>
            </ItemIn>
            <ItemIn quantity="1">
                <ItemID>
                    <SupplierPartID>AM2692</SupplierPartID>
                    <SupplierPartAuxiliaryID>A_B:5008937A_B:</SupplierPartAuxiliaryID>
                </ItemID>
                <ItemDetail>
                    <UnitPrice>
                        <Money currency="USD">250.00</Money>
                    </UnitPrice>
                    <Description xml:lang="en-US">ANTI-RNase (15-30 U/ul)</Description>
                    <UnitOfMeasure>EA</UnitOfMeasure>
                    <Classification domain="UNSPSC">41106104</Classification>
                    <ManufacturerName/>
                    <LeadTime>0</LeadTime>
                </ItemDetail>
            </ItemIn>
        </PunchOutOrderMessage>
    </Message>
</cXML>
"##;
        parser::parse(xml).unwrap();

    }
}

The parse call fails with

panicked at 'called `Result::unwrap()` on an `Err` value: (50, [ExpectedClosingQuote("\"")])', /checkout/src/libcore/result.rs:916:5

which by my math is somewhere inside the dtd uri. It parses successfully if I remove the doctype tag entirely, but I can't imagine where the quotes should be tripping this up.

Yes, we don't currently support any DTD. This example fails:

extern crate sxd_document;

use sxd_document::parser;

fn main() {
    let xml = r##"<?xml version="1.0"?>
<!DOCTYPE cXML SYSTEM "http://xml.cxml.org/schemas/cXML/1.2.014/cXML.dtd">
<cXML />
"##;
    parser::parse(xml).unwrap();
}

Do you need access to the DTD? I believe it would be easy enough to parse it and simply throw it away.

Are you interested in contributing via pull requests at all?

(Duplicate of #50)

We don't currently have a need to access to the dtd, and we are currently dropping the head of the response stream before we pass it to the parser (our hacky workaround).

It's funny because the parser has strong opinions about what attributes are present inside the doctype tag, so I thought this would be expected to work.

I'd be glad to send a PR if I knew what I was doing ;)
I can take a look and see what I can figure out.

I'm going to go ahead and move discussion into the linked issue (#50)