frostming / marko

A markdown parser with high extensibility.

Home Page:https://marko-py.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with HTML-looking tags

md975 opened this issue · comments

commented

Hi,

I'm trying to process a text coming from StackExchange in Python. It seems to be ok, except that HTML-looking tags are gone when I run
html=marko.convert(s)
e.g.
s=<p><em>This is an <example></em>. <example2> <a href="[LINK]">test</a></p>
becomes
This is an . test

How can I fix this and have:
This is an <example>. <example2> test

Thanks!

First the whole text is wrapped by a block HTML element <p> so the inner text will be kept as-is. Second, you need to escape <example> if you want to render it as literal text <example>, so the following will work:

s = '<em>This is an \<example></em>. \<example2> <a href="[LINK]">test</a>'
marko.convert(s)

It outputs:

'<p><em>This is an &lt;example&gt;</em>. &lt;example2&gt; <a href="[LINK]">test</a></p>\n'

Which renders as:

This is an <example>. <example2> test

commented

Thanks so much for your response. I guess my next question is not necessarily related to the library, but I'm new to this so I would really appreciate your help!
How can I escape them automatically? I have a very large text that I'm trying to process.
Is there an argument, or library that would handle this for me before rendering?
Thanks!

How can I escape them automatically?

They are valid HTML tags and will be kept in the render result if not escaped properly. But you can extend Marko to customize the renderer, like so:

class MyRenderer(marko.HTMLRenderer):
    tagfilter = re.compile(
        r"<(title|texarea|style|xmp|iframe|noembed|noframes|script|plaintext)",
        flags=re.I,
    )   # add more tags that you want to escape.

    def render_inline_html(self, element):
        return self.tagfilter.sub(r"&lt;\1", element.children)  # escape the left bracket '<'

markdown = marko.Markdown(renderer=MyRenderer)
markdown.convert(s)