[bug] SAX `start_element` behavior changed in libxml v2.12.0
flavorjones opened this issue · comments
Please describe the bug
Originally reported at searls/eiwa#10
#! /usr/bin/env ruby
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "nokogiri", "~>1.15.0"
end
class Document < Nokogiri::XML::SAX::Document
def start_element(name, attrs)
puts "#{__FILE__}:#{__LINE__}:#{__method__}: name=#{name.inspect}, attrs=#{attrs.inspect}"
end
end
fixture = <<~XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [
<!ATTLIST foo xml:lang CDATA "eng">
]>
<root>
<foo xml:lang="ger">Ja</foo>
</root>
XML
parser = Nokogiri::XML::SAX::Parser.new(Document.new)
parser.parse(fixture)
# with nokogiri < 1.16.0:
# ./10-sax-issue.rb:12:start_element: name="root", attrs=[]
# ./10-sax-issue.rb:12:start_element: name="foo", attrs=[["xml:lang", "ger"]]
#
# with nokogiri >= 1.16.0:
# ./10-sax-issue.rb:12:start_element: name="root", attrs=[]
# ./10-sax-issue.rb:12:start_element: name="foo", attrs=[["xml:lang", "ger"], ["xml:lang", "eng"]]
Just confirming that this seems to be an upstream issue. I can reproduce it using xmllint
and am going to git bisect.
Upstream commit is https://gitlab.gnome.org/GNOME/libxml2/-/commit/e0dd330b which first appeared in libxml 2.12.0
commit e0dd330b (HEAD)
Author: Nick Wellnhofer <wellnhofer@aevum.de>
Date: 2023-09-29 00:18:44 +0200
parser: Use hash tables to avoid quadratic behavior
Use a hash table to lookup namespaces by prefix. The hash table stores
an index into the namespace table. Auxiliary data for namespaces is
stored in a separate array along the main namespace table.
Use a hash table to verify attribute uniqueness. The hash table stores
an index into the attribute table.
Reuse hash value from the dictionary to avoid computing them twice.
See #346.
Linked issue is https://gitlab.gnome.org/GNOME/libxml2/-/issues/346
I've created an issue upstream: https://gitlab.gnome.org/GNOME/libxml2/-/issues/704
Fixed upstream in https://gitlab.gnome.org/GNOME/libxml2/-/commit/186562a182d2e27f90631d1a1f63ad5079fe62fb
Not sure whether Nick will make a release soon, but if not I can patch this fix into the vendored version in a bugfix release.
Fix released upstream in v2.12.6, working on a release for that (unrelated blockers exist so it may be a day or two).
Release imminent, please follow #3151
v1.16.3 has been released which fixes this: https://github.com/sparklemotion/nokogiri/releases/tag/v1.16.3