thephpleague / html-to-markdown

Convert HTML to Markdown with PHP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect markdown when text is not in a tag, but between two tags

multiwebinc opened this issue · comments

Version(s) affected

5.0.2

Description

Text not within a tag has weird behavior when it is between two tags. The text is combined into the text for the tag after it.

How to reproduce

HTML:

<h1>Heading one</h1>
    	
Some text

<h2>Heading two</h2>

Output:

Heading one
===========

 Some text Heading two
-----------

HTML:

<h1>Heading one</h1>
    	
Some text

<h3>Heading two</h3>

Output:

Heading one
===========

 Some text ### Heading two

However this works correctly:

Some text

<h3>Heading</h3>

Output:

Some text

### Heading

This appears to happen for any <tag></tag> Text <tag></tag> combination that I've tried.

I believe this is important because someone could be using line breaks instead of paragraphs since it is visually similar:

<div>
  <h1>Document header</h1>

  Paragraph 1<br><br>

  Paragraph 2<br><br>

  <h2>Another header</h2>
</div>
commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This should probably be reopened.

I get the same results as well.

Turns out a settings needs to be changed to get the header syntax we want

https://github.com/thephpleague/html-to-markdown?tab=readme-ov-file#style-notes