HTML>latex hides word after a tilde ~ but HTML>md>latex won't
florianm opened this issue · comments
Tested only on Ubuntu 12.04 with pandoc 1.9.1.1 (compiled with citeproc-hs 0.3.4, texmath 0.6.0.3, highlighting-kate 0.5.0.5)
TL,DR: Converting HTML to Latex directly hides any alphanumeric words following a tilde without whitespace.
Converting the same HTML document first to markdown, then to Latex, will preserve words following a tilde.
Example: test.html
<html><body>
<h1>First chapter</h1>
<p>The word after a tilde ~ will be missing. Example: ~can't ~touch ~this.</p>
<p>One little tilde sat on a wall. ~ Two little tildes had a bad fall. ~~ Three little tildes just wanted a hug. ~~~ Four little tildes show it's a bug. ~~~~</p>
</body></html>
Converting to markdown
$ pandoc test.html -o fromhtml.md
creates fromhtml.md:
First chapter
=============
The word after a tilde \~ will be missing. Example: \~can't \~touch
\~this.
One little tilde sat on a wall. \~ Two little tildes had a bad fall.
\~\~ Three little tildes just wanted a hug. \~\~\~ Four little tildes
show it's a bug. \~\~\~\~
Converting that to latex
$ pandoc fromhtml.md -o frommd.tex
creates frommd.tex:
\section{First chapter}
The word after a tilde \ensuremath{\sim} will be missing. Example:
\ensuremath{\sim}can't \ensuremath{\sim}touch \ensuremath{\sim}this.
One little tilde sat on a wall. \ensuremath{\sim} Two little tildes had
a bad fall. \ensuremath{\sim}\ensuremath{\sim} Three little tildes just
wanted a hug. \ensuremath{\sim}\ensuremath{\sim}\ensuremath{\sim} Four
little tildes show it's a bug.
\ensuremath{\sim}\ensuremath{\sim}\ensuremath{\sim}\ensuremath{\sim}
Note that following words, as well as consecutive tildes are preserved.
Now converting the original HTML directly into Latex will hide following words and tildes:
$ pandoc test.html -o fromhtml.tex
The resulting latex file fromhtml.tex:
\section{First chapter}
The word after a tilde \ensuremath{\sim} will be missing. Example:
\ensuremath{\sim}'t \ensuremath{\sim} \ensuremath{\sim}.
One little tilde sat on a wall. \ensuremath{\sim} Two little tildes had
a bad fall. \ensuremath{\sim} Three little tildes just wanted a hug.
\ensuremath{\sim} Four little tildes show it's a bug. \ensuremath{\sim}
Why does pandoc create two different results here, as it converts any input format into its own markdown dialect, then into the specified output format?
I am aware that tildes have a special meaning in markdown, but as they come in my example from HTML, they seem not to be escaped properly.
This was a bug in pandoc 1.9.1.1. But we are now on 1.11.1, which works fine on your input!
Thanks for the fast answer, John!