This library is a refactored and modernized fork of markdownify, supporting Python 3.9 and above.
- The refactored codebase uses a strict functional approach - no classes are involved.
- There is full typing with strict MyPy strict adherence and a py.typed file included.
- The
convert_to_markdown
function allows passing a pre-configured instance ofBeautifulSoup
instead of html. - This library releases follows standard semver. Its version v1.0.0 was branched from markdownify's v0.13.1, at which point versioning is no longer aligned.
pip install html_to_markdown
Convert an string HTML to Markdown:
from html_to_markdown import convert_to_markdown
convert_to_markdown('<b>Yay</b> <a href="http://github.com">GitHub</a>') # > '**Yay** [GitHub](http://github.com)'
Or pass a pre-configured instance of BeautifulSoup
:
from bs4 import BeautifulSoup
from html_to_markdown import convert_to_markdown
soup = BeautifulSoup('<b>Yay</b> <a href="http://github.com">GitHub</a>', 'lxml') # lxml requires an extra dependency.
convert_to_markdown(soup) # > '**Yay** [GitHub](http://github.com)'
The convert_to_markdown
function accepts the following kwargs:
- autolinks (bool): Automatically convert valid URLs into Markdown links. Defaults to True.
- bullets (str): A string of characters to use for bullet points in lists. Defaults to '*+-'.
- code_language (str): Default language identifier for fenced code blocks. Defaults to an empty string.
- code_language_callback (Callable[[Any], str] | None): Function to dynamically determine the language for code blocks.
- convert (Iterable[str] | None): A list of tag names to convert to Markdown. If None, all supported tags are converted.
- default_title (bool): Use the default title when converting certain elements (e.g., links). Defaults to False.
- escape_asterisks (bool): Escape asterisks (*) to prevent unintended Markdown formatting. Defaults to True.
- escape_misc (bool): Escape miscellaneous characters to prevent conflicts in Markdown. Defaults to True.
- escape_underscores (bool): Escape underscores (_) to prevent unintended italic formatting. Defaults to True.
- heading_style (Literal["underlined", "atx", "atx_closed"]): The style to use for Markdown headings. Defaults to " underlined".
- keep_inline_images_in (Iterable[str] | None): Tags in which inline images should be preserved. Defaults to None.
- newline_style (Literal["spaces", "backslash"]): Style for handling newlines in text content. Defaults to "spaces".
- strip (Iterable[str] | None): Tags to strip from the output. Defaults to None.
- strong_em_symbol (Literal["", "_"]): Symbol to use for strong/emphasized text. Defaults to "".
- sub_symbol (str): Custom symbol for subscript text. Defaults to an empty string.
- sup_symbol (str): Custom symbol for superscript text. Defaults to an empty string.
- wrap (bool): Wrap text to the specified width. Defaults to False.
- wrap_width (int): The number of characters at which to wrap text. Defaults to 80.
- convert_as_inline (bool): Treat the content as inline elements (no block elements like paragraphs). Defaults to False.
For compatibility with the original markdownify, a CLI is provided. Use html_to_markdown example.html > example.md
or
pipe input from stdin:
cat example.html | html_to_markdown > example.md
Use html_to_markdown -h
to see all available options. They are the same as listed above and take the same arguments.