JohannesKaufmann / html-to-markdown

⚙️ Convert HTML to Markdown. Even works with entire websites and can be extended through rules.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🐛 Bug: Support `<tt>` for code next to `<code>` tags

ljrk0 opened this issue · comments

Describe the bug
Unfortunately, some sites don't use semantic markup, e.g.,
http://math.andrej.com/2007/09/28/seemingly-impossible-functional-programs/
but instead specify the font directly using tt. Since markdown draws no distinction b/w code and things simply formatted in "typewriter style", these should be recognized at well (or, at least, as a plugin).

HTML Input

<tt>Some typewriter text</tt>

Generated Markdown

Some typewriter text

Expected Markdown

`Some typewriter text`

Additional context
N/A

Yeah, good catch. It's probably a good idea to add support for other HTML tags. In addition to <tt> for example also <kbd>, <samp> and <var>.

Should be a pretty easy addition to the library... I will also add a test-case with that snippet for <tt> from the website that you shared.

If you find any other tags on other websites, let me know. It's always good to have a wide variety of snippets for the tests...

Yeah, it should mostly be about adding it to the Filter expression AFAICT.

I'm currently using the linked lapwat/papeer to load all longer articles I want to read onto my e reader, so I'm sure to stumble upon other missing tags :-)

I've went ahead and added an issue tracking MathJax although that is a bit more complicated:
#50

This should now be fixed in the latest version.

Make sure to update the library by running:

go get -u github.com/JohannesKaufmann/html-to-markdown

If you encounter any other bugs, feel free to open a new issue...