yorickpeterse / inko-markdown

An Inko library for parsing an opinionated dialect of Markdown

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

inko-markdown

An opinionated dialect of Markdown, aimed at making it easy to use and maintain, written in Inko.

This library started as an experiment in parsing Markdown using conventional LL(k) parsing techniques, rather than the usual approach involving distinct block and inline parsing phases. As it turns out, it's certainly doable provided we bend the rules a little bit. For example, parsing becomes easier if we drop support for indented code blocks and use a slightly different syntax for block quotes.

Differences with Markdown

To make parsing easier (read: not a nightmare), inko-markdown only supports a subset of the Markdown syntax, though this should still be good enough for most Markdown documents out there. Apart from syntax specific differences (as mentioned below), one big difference is that inko-markdown is strict: when it encounters invalid syntax, an error is produced instead of silently rendering the document incorrectly. This significantly simplifies the parser, and makes debugging incorrectly formatted documents easier.

Lists

Unordered lists are created using -, and ordered lists using 1.. Other list markers (including 2., 3., etc) aren't supported:

- Unordered list item 1
- Unordered list item 2

1. Ordered list item 1
1. Ordered list item 2

List values are limited to inline values and other lists. Block elements inside lists aren't supported.

Emphasis

For strong emphasis, only **strong** is supported. For regular emphasis, only _emphasis_ is supported. __word__ is parsed as <em></em>word<em></em>.

Headings

Only ATX headings are supported:

# Heading 1
## Heading 2
### Heading 3
#### Heading 4
##### Heading 5
###### Heading 6

The contents of a heading can be wrapped like paragraphs, so this:

# Foo
bar

Turns into this:

<h1>Foo
bar</h1>

Hard breaks

Hard line breaks use a trailing backslash, instead of relying on trailing whitespace:

this is \
a single paragraph

No inline HTML

Inline HTML isn't supported, but custom blocks are supported (see the extensions section for more details).

Code blocks and spans

Indented code blocks aren't supported, instead only fenced code blocks and inline code blocks are supported. Fenced code blocks are always parsed as code blocks, even if they are preceded by text:

This is text```
this is a code block, instead of literal text (per CommonMark)
```

Leading/trailing whitespace inside a code span isn't stripped and instead left as-is.

Block quotes use a syntax similar to fenced code blocks

Markdown's block quote syntax is tricky to parse, and rather painful to work with as a document author. inko-markdown instead uses a block syntax similar to fenced code blocks, inspired by GitLab Flavoured Markdown:

>>>
block quote
>>>

Similar to fenced code blocks, the signs can be repeated to allow for nested block quotes:

>>>
before

>>>>
inner
>>>>

after
>>>

Only one type of thematic break

The only form of thematic breaks that is supported is ---, without any spaces between the hyphens.

Links containing parentheses need to be escaped

When using the syntax [text](link), if link contains parentheses, these must be escaped like so:

[Cookie](https://en.wikipedia.org/wiki/Cookie_\(disambiguation\))

Extensions

Tables

The commonly used pipe table syntax is tricky to parse, and doesn't work well for anything but basic content, as you can't wrap the content across lines. inko-markdown uses a different syntax that makes working with tables easier.

A basic table looks like this:

|-
| Row 1 column 1
| Row 1 column 2
|-
| Row 2 column 1
| Row 2 column 2

Here |- signals the start of a table row, and | the start of a column in that row.

Header rows are created using |= and must come before regular rows:

|=
| Header row 1 column 1
| Header row 1 column 2
|-
| Row 1 column 1
| Row 1 column 2
|-
| Row 2 column 1
| Row 2 column 2

Table footers are created using |+, and must come after regular rows:

|=
| Header row 1 column 1
| Header row 1 column 2
|-
| Row 1 column 1
| Row 1 column 2
|-
| Row 2 column 1
| Row 2 column 2
|+
| Footer row 1 column 1
| Footer row 1 column 2

Cell/row formatting rules, such as text alignment, aren't supported; you should use CSS for that instead.

If you start a table with a footer, or a column (without the row marker), the parser produces an error.

Table cells are limited to inline elements, so you can't put (for example) a list in a table cell. You can however wrap the content across lines, provided the wrapped lines start with at least a single space:

|-
| Row 1 column 1
| Row 1 column 2
  wrapped across a line

Footnotes

Footnotes are referred to using [^footnote], and defined using [^footnote]: value. Footnote values are limited to inline elements, but their values can be wrapped across multiple lines, as long as each line starts with at least a single space:

This is an example of a footnote[^example].

[^example]: foo
  bar
  baz

Custom blocks and spans

Similar to Djot, inko-markdown supports custom blocks/divs with an optional tag:

::: tag-name

:::

Just like fenced code blocks, custom blocks need at least three colons (:::), but more are supported to allow nesting of blocks:

::: outer
:::: inner
foo
::::
:::

This results in:

<div class="outer">
  <div class="inner">
    <p>foo</p>
  </div>
</div>

Inline spans also use the same syntax as Djot:

[This is the body]{highlight}

This results in:

<p><span class="highlight">This is the body</span></p>

Superscript and subscript

Superscript text uses the syntax ^body^, while subscript uses ~body~. The content can be any inline value. Nesting isn't supported, so ^^foo^^ results in <sup></sup>foo<sup></sup>.

Requirements

  • Inko 0.13.1 or newer

Installation

inko pkg add github.com/yorickpeterse/inko-markdown 0.5.0
inko pkg sync

Usage

Hello world using inko-markdown:

import markdown.Document
import std.stdio.STDOUT

class async Main {
  fn async main {
    let input = 'Hello **world**'
    let doc = Document.parse(input).unwrap
    let html = doc.to_html.to_string

    STDOUT.new.print(html)
  }
}

This produces the following HTML:

<p>Hello <strong>world</strong></p>

Filters

Filters are used to transform HTML documents generated from Markdown documents. Filters implement the trait markdown.html.Filter, and modify documents in place. Filters are used as follows:

import markdown.Document
import markdown.html.TableOfContents

# Filters operate on HTML documents, not Markdown documents, so we must generate
# an HTML document first:
let doc = Document.parse('# Foo').unwrap.to_html

TableOfContents.new.run(doc)
doc.to_pretty_string

TableOfContents

The filter markdown.html.TableOfContents adds id attributes to all headers based on their content, and optionally replaces custom blocks with the toc tag with a table of contents. For example, consider the following Markdown document:

::: toc
:::

# Installation
## Linux
### Alpine
### Ubuntu
## Windows
# Manually

When converting this to HTML and applying this filter, the output is transformed into the following:

<ul class="toc">
  <li>
    <a href="#installation">Installation</a>
    <ul>
      <li>
        <a href="#linux">Linux</a>
        <ul>
          <li><a href="#alpine">Alpine</a></li>
          <li><a href="#ubuntu">Ubuntu</a></li>
        </ul>
      </li>
      <li><a href="#windows">Windows</a></li>
    </ul>
  </li>
  <li><a href="#manually">Manually</a></li>
</ul>
<h1 id="installation">Installation</h1>
<h2 id="linux">Linux</h2>
<h3 id="alpine">Alpine</h3>
<h3 id="ubuntu">Ubuntu</h3>
<h2 id="windows">Windows</h2>
<h1 id="manually">Manually</h1>

The following fields can be set to customize the process of generating the table of contents:

Option Default Description
class toc The class value of the container.
maximum 6 The maximum header level to include.

License

All source code in this repository is licensed under the Mozilla Public License version 2.0, unless stated otherwise. A copy of this license can be found in the file "LICENSE".

About

An Inko library for parsing an opinionated dialect of Markdown

License:Mozilla Public License 2.0


Languages

Language:Makefile 100.0%