Parity: Footnote labels are not encoded
digitalmoksha opened this issue · comments
When including characters in a footnote name, such as an emoji, those characters are not encoded. cmark-gfm
does encode these.
cmark_gfm (cmark-gfm 0.29.0.gfm.10)
second[^😄second]
[^😄second]: two
gives
<p>second<sup class="footnote-ref"><a href="#fn-%F0%9F%98%84second" id="fnref-%F0%9F%98%84second" data-footnote-ref>1</a></sup></p>
<section class="footnotes" data-footnotes>
<ol>
<li id="fn-%F0%9F%98%84second">
<p>two <a href="#fnref-%F0%9F%98%84second" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="1" aria-label="Back to reference 1">↩</a></p>
</li>
</ol>
</section>
comrak
gives
<p>second<sup class="footnote-ref"><a href="#fn-😄second" id="fnref-😄second" data-footnote-ref>1</a></sup></p>
<section class="footnotes" data-footnotes>
<ol>
<li id="fn-😄second">
<p>two <a href="#fnref-😄second" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="1" aria-label="Back to reference 1">↩</a></p>
</li>
</ol>
</section>
Looking at using the percent_encoding
crate to encode utf8 characters
Related, uppercase is stripped away in footnote names
third[^Test]
[^Test]: three
in cmark-gfm
gives
<p>third<sup class="footnote-ref"><a href="#fn-Test" id="fnref-Test" data-footnote-ref>1</a></sup></p>
<section class="footnotes" data-footnotes>
<ol>
<li id="fn-Test">
<p>three <a href="#fnref-Test" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="1" aria-label="Back to reference 1">↩</a></p>
</li>
</ol>
</section>
and comrak
gives
<p>third[^Test]</p>
We need a different normalization routine for footnote names
EDIT: it looks like cmark-gfm
is case-insensitive but case-preserving. [^Ab]
and [^aB]
end up referencing the same footnote, but preserves the casing of the name in the ids
Hmm, maybe it's escape_href
in html.rs
that I need...
PR #308 addresses this