sparklemotion / nokogiri

Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby.

Home Page:https://nokogiri.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`clone_node()` doesn't duplicate nonstandard tag names

stevecheckoway opened this issue · comments

While investigating #3098, I noticed that Gumbo's clone_node() function doesn't make a copy of nonstandard tag names.

I think this is the fix

diff --git a/gumbo-parser/src/parser.c b/gumbo-parser/src/parser.c
index 67812b23..c3e5e038 100644
--- a/gumbo-parser/src/parser.c
+++ b/gumbo-parser/src/parser.c
@@ -1377,6 +1377,9 @@ static GumboNode* clone_node (
   *new_node = *node;
   new_node->parent = NULL;
   new_node->index_within_parent = -1;
+
+  if (node->v.element.tag == GUMBO_TAG_UNKNOWN)
+    new_node->v.element.name = gumbo_strdup(node->v.element.name);
   // Clear the GUMBO_INSERTION_IMPLICIT_END_TAG flag, as the cloned node may
   // have a separate end tag.
   new_node->parse_flags &= ~GUMBO_INSERTION_IMPLICIT_END_TAG;

but I'd like to understand why this hasn't been causing a bunch of memory leaks first.

@stevecheckoway It looks like clone_node isn't being called for an unknown tag in the test suite.

Here's the patch I used:

diff --git a/gumbo-parser/src/parser.c b/gumbo-parser/src/parser.c
index 06f096f8..180ee746 100644
--- a/gumbo-parser/src/parser.c
+++ b/gumbo-parser/src/parser.c
@@ -20,6 +20,7 @@
 #include <stdint.h>
 #include <stdlib.h>
 #include <string.h>
+#include <stdio.h>
 
 #include "ascii.h"
 #include "attribute.h"
@@ -1396,6 +1397,11 @@ static GumboNode* clone_node (
   *new_node = *node;
   new_node->parent = NULL;
   new_node->index_within_parent = -1;
+
+  if (node->v.element.tag == GUMBO_TAG_UNKNOWN) {
+    fprintf(stderr, "MIKE: unknown tag %s\n", node->v.element.name);
+  }
+
   // Clear the GUMBO_INSERTION_IMPLICIT_END_TAG flag, as the cloned node may
   // have a separate end tag.
   new_node->parse_flags &= ~GUMBO_INSERTION_IMPLICIT_END_TAG;

and it never prints anything!