Issue with HTML-looking tags

Question

Issue with HTML-looking tags

md975 opened this issue 2 years ago · comments

Hi,

I'm trying to process a text coming from StackExchange in Python. It seems to be ok, except that HTML-looking tags are gone when I run
html=marko.convert(s)
e.g.
s=This is an <example>. <example2> <a href="[LINK]">test</a>
becomes
This is an . test

How can I fix this and have:
This is an <example>. <example2> test

Thanks!

Frost Ming · Answer 1 · Tue Feb 08 2022 17:00:50 GMT+0800 (China Standard Time)

First the whole text is wrapped by a block HTML element  so the inner text will be kept as-is. Second, you need to escape <example> if you want to render it as literal text <example>, so the following will work:

s = '<em>This is an \<example></em>. \<example2> <a href="[LINK]">test</a>'
marko.convert(s)

It outputs:

'<p><em>This is an &lt;example&gt;</em>. &lt;example2&gt; <a href="[LINK]">test</a></p>\n'

Which renders as:

This is an <example>. <example2> test

md975 · Answer 2 · Wed Feb 09 2022 00:11:23 GMT+0800 (China Standard Time)

Thanks so much for your response. I guess my next question is not necessarily related to the library, but I'm new to this so I would really appreciate your help!
How can I escape them automatically? I have a very large text that I'm trying to process.
Is there an argument, or library that would handle this for me before rendering?
Thanks!

Frost Ming · Answer 3 · Wed Feb 09 2022 08:38:08 GMT+0800 (China Standard Time)

How can I escape them automatically?

They are valid HTML tags and will be kept in the render result if not escaped properly. But you can extend Marko to customize the renderer, like so:

class MyRenderer(marko.HTMLRenderer):
    tagfilter = re.compile(
        r"<(title|texarea|style|xmp|iframe|noembed|noframes|script|plaintext)",
        flags=re.I,
    )   # add more tags that you want to escape.

    def render_inline_html(self, element):
        return self.tagfilter.sub(r"&lt;\1", element.children)  # escape the left bracket '<'

markdown = marko.Markdown(renderer=MyRenderer)
markdown.convert(s)