Error when parsing tag attributes starting with @
jvkerckh opened this issue · comments
Greetings,
for my current project I need to be able to parse attributes that can start with @. However, parsehtml
throws up a warning and doesn't parse the attribute at all while parsexml
outright throws an error.
Example:
julia> htmlsnip = "<p @foo=\"bar\">content</p>"
"<p @foo=\"bar\">content</p>"
Using parsehtml
:
julia> htmlsnip |> parsehtml
┌ Warning: XMLError: error parsing attribute name from HTML parser (code: 68, line: 1)
└ @ EzXML ~/.julia/packages/EzXML/ZNwhK/src/error.jl:95
EzXML.Document(EzXML.Node(<HTML_DOCUMENT_NODE@0x0000000005cb8ad0>))
Printing the result shows the attribute is not parsed:
julia> htmlsnip |> parsehtml |> prettyprint
┌ Warning: XMLError: error parsing attribute name from HTML parser (code: 68, line: 1)
└ @ EzXML ~/.julia/packages/EzXML/ZNwhK/src/error.jl:95
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
<p>content</p>
</body>
</html>
Using parsexml
:
julia> htmlsnip |> parsexml
┌ Warning: caught 4 errors; showing the first one
└ @ EzXML ~/.julia/packages/EzXML/ZNwhK/src/error.jl:79
ERROR: XMLError: error parsing attribute name from XML parser (code: 68, line: 1)
Stacktrace:
[1] throw_xml_error()
@ EzXML ~/.julia/packages/EzXML/ZNwhK/src/error.jl:87
[2] macro expansion
@ ~/.julia/packages/EzXML/ZNwhK/src/error.jl:52 [inlined]
[3] parsexml(xmlstring::String)
@ EzXML ~/.julia/packages/EzXML/ZNwhK/src/document.jl:80
[4] |>(x::String, f::typeof(parsexml))
@ Base ./operators.jl:911
[5] top-level scope
@ REPL[77]:1
I'm using Julia v1.8.0 and EzXML v1.1.0, with no other packages in the environment.
I had the same problem today. As I always traverse the whole document I could mask the '@' char and replace it afterwards. Depending on your goal you could do something similar.
function parse_vue_html(html)
doc_string = replace(html, "@"=>"__vue-on__")
empty!(EzXML.XML_GLOBAL_ERROR_STACK)
doc = Logging.with_logger(Logging.SimpleLogger(stdout, Logging.Error)) do
EzXML.parsehtml(doc_string).root
end
# remove the html -> body levels
replace(parse_elem(first(eachelement(first(eachelement(doc))))), "__vue-on__" => "@")
end
Note that the parser parse_elem()
replaces the instances of __vue-on__
that occur as attribute name.