kivikakk / comrak

CommonMark + GFM compatible Markdown parser and renderer

Home Page:https://hrzn.ee/kivikakk/comrak

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parity: Footnote labels are not encoded

digitalmoksha opened this issue · comments

When including characters in a footnote name, such as an emoji, those characters are not encoded. cmark-gfm does encode these.

cmark_gfm (cmark-gfm 0.29.0.gfm.10)

second[^😄second]
[^😄second]: two

gives

<p>second<sup class="footnote-ref"><a href="#fn-%F0%9F%98%84second" id="fnref-%F0%9F%98%84second" data-footnote-ref>1</a></sup></p>
<section class="footnotes" data-footnotes>
<ol>
<li id="fn-%F0%9F%98%84second">
<p>two <a href="#fnref-%F0%9F%98%84second" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="1" aria-label="Back to reference 1">↩</a></p>
</li>
</ol>
</section>

comrak

gives

<p>second<sup class="footnote-ref"><a href="#fn-😄second" id="fnref-😄second" data-footnote-ref>1</a></sup></p>
<section class="footnotes" data-footnotes>
<ol>
<li id="fn-😄second">
<p>two <a href="#fnref-😄second" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="1" aria-label="Back to reference 1">↩</a></p>
</li>
</ol>
</section>

Looking at using the percent_encoding crate to encode utf8 characters

Related, uppercase is stripped away in footnote names

third[^Test]
[^Test]: three

in cmark-gfm gives

<p>third<sup class="footnote-ref"><a href="#fn-Test" id="fnref-Test" data-footnote-ref>1</a></sup></p>
<section class="footnotes" data-footnotes>
<ol>
<li id="fn-Test">
<p>three <a href="#fnref-Test" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="1" aria-label="Back to reference 1">↩</a></p>
</li>
</ol>
</section>

and comrak gives

<p>third[^Test]</p>

We need a different normalization routine for footnote names

EDIT: it looks like cmark-gfm is case-insensitive but case-preserving. [^Ab] and [^aB] end up referencing the same footnote, but preserves the casing of the name in the ids

Hmm, maybe it's escape_href in html.rs that I need...

PR #308 addresses this