microcosm-cc / bluemonday

bluemonday: a fast golang HTML sanitizer (inspired by the OWASP Java HTML Sanitizer) to scrub user generated content of XSS

Home Page:https://github.com/microcosm-cc/bluemonday

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Outputting link href attribute in brackets after link text

opened this issue · comments

Is there a way with this sanitiser to output the value of an anchor tag href attribute in brackets, after the link text.

The rest of the html would be stripped according to the Strict policy.

for example:

<a href="http://mylink.com/test">My Link</a>

Would become

My Link(http://mylink.com/test)

commented

This library does not do this.

But you could implement it yourself using https://pkg.go.dev/golang.org/x/net/html to parse the HTML into an AST where you then write that out in the format desired.

This also strikes me as a "I want to convert HTML to Markdown" problem... for which https://pandoc.org/ is well suited for... and then you could apply the Strict policy here to remove everything that wasn't just plain text / markdown (because Markdown permits HTML to be embedded and things not compatible with Markdown would be left as HTML).

Thanks for your advice and sorry for the delay in my response. I am taking html content and stripping the raw text to make it easier to search without html tags and attributes corrupting the search results. It is not vital for links to be searchable, but I thought that I would double check just in case there was an easy solution.