manuel-rubio / earmark

Markdown parser for Elixir

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Earmark—A Pure Elixir Markdown Processor

CI Hex.pm Hex.pm Hex.pm

N.B.

This README contains the docstrings and doctests from the code by means of extractly and the following code examples are therefore verified with ExUnit doctests.

Dependency

{ :earmark, "> x.y.z" }

Earmark

Abstract Syntax Tree and Rendering

The AST generation has now been moved out to EarmarkParser which is installed as a dependency.

This brings some changes to this documentation and also deprecates the usage of Earmark.as_ast

Earmark takes care of rendering the AST to HTML, exposing some AST Transformation Tools and providing a CLI as escript.

Therefore you will not find a detailed description of the supported Markdown here anymore as this is done in here

Earmark.as_ast

WARNING: This is just a proxy towards EarmarkParser.as_ast and is deprecated, it will be removed in version 1.5!

Replace your calls to Earmark.as_ast with EarmarkParse.as_ast as soon as possible.

N.B. If all you use is Earmark.as_ast consider only using EarmarkParser.

Also please refer yourself to the documentation of EarmarkParser

The function is described below and the other two API functions as_html and as_html! are now based upon the structure of the result of as_ast.

{:ok, ast, []}                   = EarmarkParser.as_ast(markdown)
{:ok, ast, deprecation_messages} = EarmarkParser.as_ast(markdown)
{:error, ast, error_messages}    = EarmarkParser.as_ast(markdown)

Earmark.as_html

{:ok, html_doc, []}                   = Earmark.as_html(markdown)
{:ok, html_doc, deprecation_messages} = Earmark.as_html(markdown)
{:error, html_doc, error_messages}    = Earmark.as_html(markdown)

Earmark.as_html!

html_doc = Earmark.as_html!(markdown, options)

Formats the error_messages returned by as_html and adds the filename to each. Then prints them to stderr and just returns the html_doc

Options

Options can be passed into as as_html/2 or as_html!/2 according to the documentation. A keyword list with legal options (c.f. Earmark.Options) or an Earmark.Options struct are accepted.

{status, html_doc, errors} = Earmark.as_html(markdown, options)
html_doc = Earmark.as_html!(markdown, options)
{status, ast, errors} = EarmarkParser.as_ast(markdown, options)

Rendering

All options passed through to EarmarkParser.as_ast are defined therein, however some options concern only the rendering of the returned AST

These are:

  • compact_output: defaults to false

Normally Earmark aims to produce Human Readable output.

This will give results like these:

iex(0)> markdown = "# Hello\nWorld"
...(0)> Earmark.as_html!(markdown, compact_output: false)
"<h1>\nHello</h1>\n<p>\nWorld</p>\n"

But sometimes whitespace is not desired:

iex(1)> markdown = "# Hello\nWorld"
...(1)> Earmark.as_html!(markdown, compact_output: true)
"<h1>Hello</h1><p>World</p>"

Be cautions though when using this options, lines will become loooooong.

escape: defaulting to true

If set HTML will be properly escaped

  iex(2)> markdown = "Hello<br />World"
  ...(2)> Earmark.as_html!(markdown)
  "<p>\nHello&lt;br /&gt;World</p>\n"

However disabling escape: gives you maximum control of the created document, which in some cases (e.g. inside tables) might even be necessary

  iex(3)> markdown = "Hello<br />World"
  ...(3)> Earmark.as_html!(markdown, escape: false)
  "<p>\nHello<br />World</p>\n"
  • postprocessor: defaults to nil

Before rendering the AST is transformed by a postprocessor. For details see the description of Earmark.Transform.map_ast· below which will accept the same postprocessor as a matter of fact specifying postprocessor: fun is conecptionnaly the same as

      markdown
      |> EarmarkParser.as_ast
      |> Earmark.Transform.map_ast(fun)
      |> Earmark.Transform.transform

with all the necessary bookkeeping for options and messages

  • renderer: defaults to Earmark.HtmlRenderer

    The module used to render the final document.

smartypants: defaulting to true

If set the following replacements will be made during rendering of inline text

"---" → "—"
"--" → "–"
"' → "’"
?" → "”"
"..." → "…"

Command line

$ mix escript.build
$ ./earmark file.md

Some options defined in the Earmark.Options struct can be specified as command line switches.

Use

$ ./earmark --help

to find out more, but here is a short example

$ ./earmark --smartypants false --code-class-prefix "a- b-" file.md

will call

Earmark.as_html!( ..., %Earmark.Options{smartypants: false, code_class_prefix: "a- b-"})

Timeouts

By default, that is if the timeout option is not set Earmark uses parallel mapping as implemented in Earmark.pmap/2, which uses Task.await with its default timeout of 5000ms.

In rare cases that might not be enough.

By indicating a longer timeout option in milliseconds Earmark will use parallel mapping as implemented in Earmark.pmap/3, which will pass timeout to Task.await.

In both cases one can override the mapper function with either the mapper option (used if and only if timeout is nil) or the mapper_with_timeout function (used otherwise).

For the escript only the timeout command line argument can be used.

Security

Please be aware that Markdown is not a secure format. It produces HTML from Markdown and HTML. It is your job to sanitize and or filter the output of Earmark.as_html if you cannot trust the input and are to serve the produced HTML on the Web.

Transformations

Structure Conserving Transformers

For the convenience of processing the output of EarmarkParser.as_ast we expose two structure conserving mappers.

map_ast

takes a function that will be called for each node of the AST, where a leaf node is either a quadruple like {"code", [{"class", "inline"}], ["some code"], %{}} or a text leaf like "some code"

The result of the function call must be

  • for nodes → a quadruple of which the third element will be ignored -- that might change in future, and will therefore classically be nil. The other elements replace the node

  • for strings → strings

A third parameter ignore_strings which defaults to false can be used to avoid invocation of the mapper function for text nodes

As an example let us transform an ast to have symbol keys

  iex(0)> input = [
  ...(0)> {"h1", [], ["Hello"], %{title: true}},
  ...(0)> {"ul", [], [{"li", [], ["alpha"], %{}}, {"li", [], ["beta"], %{}}], %{}}] 
  ...(0)> map_ast(input, fn {t, a, _, m} -> {String.to_atom(t), a, nil, m} end, true)
  [ {:h1, [], ["Hello"], %{title: true}},
    {:ul, [], [{:li, [], ["alpha"], %{}}, {:li, [], ["beta"], %{}}], %{}} ]

N.B. If this returning convention is not respected map_ast might not complain, but the resulting transformation might not be suitable for Earmark.Transform.transform anymore. From this follows that any function passed in as value of the postprocessor: option must obey to these conventions.

map_ast_with

this is like map_ast but like a reducer an accumulator can also be passed through.

For that reason the function is called with two arguments, the first element being the same value as in map_ast and the second the accumulator. The return values need to be equally augmented tuples.

A simple example, annotating traversal order in the meta map's :count key, as we are not interested in text nodes we use the fourth parameter ignore_strings which defaults to false

   iex(0)>  input = [
   ...(0)>  {"ul", [], [{"li", [], ["one"], %{}}, {"li", [], ["two"], %{}}], %{}},
   ...(0)>  {"p", [], ["hello"], %{}}]
   ...(0)>  counter = fn {t, a, _, m}, c -> {{t, a, nil, Map.put(m, :count, c)}, c+1} end
   ...(0)>  map_ast_with(input, 0, counter, true) 
   {[ {"ul", [], [{"li", [], ["one"], %{count: 1}}, {"li", [], ["two"], %{count: 2}}], %{count: 0}},
     {"p", [], ["hello"], %{count: 3}}], 4}

Structure Modifying Transformers

For structure modifications a tree traversal is needed and no clear pattern of how to assist this task with tools has emerged yet.

Contributing

Pull Requests are happily accepted.

Please be aware of one caveat when correcting/improving README.md.

The README.md is generated by Extractly as mentioned above and therefore contributers shall not modify it directly, but README.md.eex and the imported docs instead.

Thank you all who have already helped with Earmark, your names are duely noted in RELEASE.md.

Author

Copyright © 2014,5,6,7,8,9, 2020,1 Dave Thomas, The Pragmatic Programmers & Robert Dober @/+pragdave, dave@pragprog.com

LICENSE

Same as Elixir, which is Apache License v2.0. Please refer to LICENSE for details.

SPDX-License-Identifier: Apache-2.0

About

Markdown parser for Elixir

License:Other


Languages

Language:Elixir 100.0%