thephpleague / html-to-markdown

Convert HTML to Markdown with PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

<pre class="language-"><code>

kbitlive opened this issue · comments

Version(s) affected

5.0.2

Description

How to reproduce

html

<pre class="language-"><code>GET /announcements
 </code></pre>

after convert

```
<pre class="language-">```
GET /announcements

```
```
  $markdown_show = $converter->convert($content);
  // strip_tags pre
  $markdown_show = strip_tags($markdown_show);

Depends you need <pre> tag or not. you can do this for now. I will see if I can send a PR to fix this issue

If you need to remove pre, use a function to stripe it out - strip_tags_only needs to be written

$clean_content = str_replace(array("```\n```"), array("```\n"), strip_tags_only($markdown_content, 'pre'));

If not and you want to close the tag

$clean_content = str_replace(array("```\n```", "```\n<pre"), array("```\n</pre>", "\n<pre"), $markdown_content);

a work around of this bug for now.

I have done a better solution which I want to share
in fact, if you have HTML already. you can removecode tag and then replace pre tag to code since, it has language class too. it will be way better because the output will have backticks with no pre tag and with proper language assigned.

$html_content = $post['content']['rendered'];
		$html_remove_code_tag = strip_tags_only($html_content, array('code'));
		$html_pre_to_code_tag = str_replace(array('<pre','</pre>'), array('<pre><code','</code></pre>'), $html_remove_code_tag);
		$markdown_content = $daextulmap_converter->convert( $html_pre_to_code_tag );

$echo $markdown_content;

I think it's better that the library does this. but for now. it works.

I think this issue comes from this line

$preContent = \str_replace(['<pre>', '</pre>'], '', $preContent);

It's not considering any attributes the pre tag could have like it's done in here
$code = \preg_replace('/<code\b[^>]*>/', '', $code);

I might be able to submit a PR later