mozilla / readability

A standalone version of the readability lib

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem handling invalid HTML attributes

ZachSaucier opened this issue · comments

I recently started using Readability.js for a reader view that I work on. One user reported an error with this website (note: it's paywalled so only a limited number of people can view it). The error shown in the console is:

Error handling response: Error: Failed to execute 'setAttribute' on 'Element': '@click' is not a valid attribute name.
    at Readability._simplifyNestedElements

This points to this part of Readability.js:

} else if (
  this._hasSingleTagInsideElement(node, "DIV") ||
  this._hasSingleTagInsideElement(node, "SECTION")
) {
  var child = node.children[0];
    for (var i = 0; i < node.attributes.length; i++) {
      child.setAttribute(
        node.attributes[i].name,
        node.attributes[i].value
      );
    }

I think @click as an attribute is a Vue.js thing.

Regardless, can Readability.js be updated to properly handle (probably just ignore) invalid HTML attributes instead of breaking? I can see validation of attribute names as a way forward or try/catching it and ignoring bits that error.

commented

The annoying thing is that we're just trying to collapse 2 element nodes together, so those "invalid" attributes are already present. But it turns out that what the HTML parser allows as attribute names (basically almost anything) is quite different from what setAttribute allows.

I asked around a bit and it seems the solution might have to be using setAttributeNode to move the actual existing attribute nodes across. That would require implementing that API in JSDOMParser and verifying that it works reasonably well in jsdom.