XPath not working as expected with findall
anandijain opened this issue · comments
I cannot seem to make findall
work as intended. According to https://www.w3schools.com/xml/xpath_syntax.asp, it seems like the Xpath //ci
should get compartment
, k1
, S1
, but findall
returns empty Node.
<math xmlns="http://www.w3.org/1998/Math/MathML">
<apply>
<times/>
<ci> compartment
</ci>
<ci> k1
</ci>
<ci> S1
</ci>
</apply>
</math>
julia> findall("//ci", xml)
EzXML.Node[]
Your document has a namespace. See here
In particular:
There is a caveat on the combination of XPath and namespaces: if a document contains elements with a default namespace, you need to specify its prefix to the find* function. For example, in the following example, the root element and its descendants have a default namespace "http://www.foobar.org", but it does not have its own prefix. In this case, you need to assign a prefix to the namespance when finding elements in the namespace:
julia> doc = parsexml(""" <parent xmlns="http://www.foobar.org"> <child/> </parent> """) EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x00007fdc67710030>)) julia> findall("/parent/child", doc.root) # nothing will be found 0-element Array{EzXML.Node,1} julia> namespaces(doc.root) # the default namespace has an empty prefix 1-element Array{Pair{String,String},1}: "" => "http://www.foobar.org" julia> ns = namespace(doc.root) # get the namespace "http://www.foobar.org" julia> findall("/x:parent/x:child", doc.root, ["x"=>ns]) # specify its prefix as "x" 1-element Array{EzXML.Node,1}: EzXML.Node(<ELEMENT_NODE[child]@0x00007fdc6774c990>)
So for yours:
julia> str = """
<math xmlns="http://www.w3.org/1998/Math/MathML">
<apply>
<times/>
<ci> compartment
</ci>
<ci> k1
</ci>
<ci> S1
</ci>
</apply>
</math>
"""
"<math xmlns=\"http://www.w3.org/1998/Math/MathML\">\n <apply>\n <times/>\n <ci> compartment\n </ci>\n <ci> k1\n </ci>\n <ci> S1\n </ci>\n </apply>\n</math>\n"
julia> xml = parsexml(str)
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x0000558771efacd0>))
julia> ns = namespace(xml.root)
"http://www.w3.org/1998/Math/MathML"
julia> findall("//x:ci", xml.root, ["x"=>ns])
3-element Vector{EzXML.Node}:
EzXML.Node(<ELEMENT_NODE[ci]@0x00005587718755e0>)
EzXML.Node(<ELEMENT_NODE[ci]@0x000055877175d460>)
EzXML.Node(<ELEMENT_NODE[ci]@0x0000558772b529d0>)
Thanks!
@kescobo I have another case here that I think would be better not lost to Slackhole.
julia> str = """<cn type="e-notation" cellml:units="molar_per_minute">5 <sep/>-2</cn>"""
"<cn type=\"e-notation\" cellml:units=\"molar_per_minute\">5 <sep/>-2</cn>"
julia> parsexml(str)
EzXML.Document(EzXML.Node(<DOCUMENT_NODE@0x0000000002880610>))
julia> parsexml(str).root
ERROR: AssertionError: isempty(XML_GLOBAL_ERROR_STACK)
I'm wondering if I can ignore the undefined namespace, basically remove all cellml:
attributes from a Document
, or what the standard workaround here is.
I suppose I could do findall("//*[@cellml:*]", node)
and then just delete!
the attributes.
This is sort of an annoying hack, as other formats might have other namespaces.
If you have any thoughts, I'd really appreciate it!
Alas, I have no idea. The only reason I knew how to answer the previous question is because I'd run into the same issue before and someone else helped me. I'm no expert! Good luck :-/