miyakogi / m2r

Markdown to reStructuredText converter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect table conversion

ntolia opened this issue · comments

First of all, thank you for creating and supporting m2r! Really appreciate it.

That said, I am trying to convert the following (extracted) markdown file into rst.

| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| foo   | bar |  |  |

<a name="test"/>

The generated table is missing a row and will cause errors along the lines of Error parsing content block for the "list-table" directive: uniform two-level bullet list expected, but row 2 does not contain the same number of items as row 1 (3 vs 4). This happens because the generated output looks like:

.. role:: raw-html-m2r(raw)
   :format: html


.. list-table::
   :header-rows: 1

   * - Field
     - Type
     - Label
     - Description
   * - foo
     - bar
     -


:raw-html-m2r:`<a name="test"/>`

and is missing the fourth row in the table. However, if the last line of the markdown is removed, the problem disappers. I am using m2r 0.1.14.

I have a feeling the bug might be in mistune or how it is being used by m2r but haven't been able to narrow it down just yet. Will update this issue if I find out anything else.

Here is what I have found so far. For certain tables that lack a missing value in the last cell of the last row and that are followed by an internal HTML hyperlink (and possibly other text too), the tool will chew up the last cell and cause errors in the final HTML docs generation. On further debugging, this seems to happen because of a potential bug in the mistune library where, in parse_table(), the value of the matched group (m.group(3) if you are looking at the code) for cells includes a spurious newline at the end of the text that is being parsed. This is likely happening because of the rule matching in parse() against the rule table (see manipulate in mistune):

table = re.compile(
        r'^ *\|(.+)\n *\|( *[-:]+[-| :]*)\n((?: *\|.*(?:\n|$))*)\n*'
    )

It is possible to show that this bug goes away if I rstrip() the matched group before breaking it up into cells inside mistune's parse_table() but I am still not sure what the right fix is in this case or what might be wrong in the above regex.

This is likely the same thing as lepture/mistune#118

The problem does seem to be with mistune. A quick fix is to add the following method to the RestBlockLexer class:

    def parse_table(self, m):
        #
        # ammended version of mistune.BlockLexer method
        #
        item = self._process_table(m)
        cols = len(item['header'])  #added
        cells = re.sub(r'(?: *\| *)?\n$', '', m.group(3))
        cells = cells.split('\n')
        for i, v in enumerate(cells):
            v = re.sub(r'^ *\| *| *\| *$', '', v)
            cells[i] = re.split(r' *(?<!\\)\| *', v)
            #
            # The header row must match the delimiter row in the number of cells. 
            # If not, a table will not be recognized. The remainder of the table’s 
            # rows may vary in the number of cells. If there are a number of cells 
            # fewer than the number of cells in the header row, empty cells are 
            # inserted. See https://github.github.com/gfm/#example-203
            while len(cells[i]) < cols: #added
                cells[i].append('')
            # If there are greater, the excess is ignored
            # see https://github.github.com/gfm/#example-203    
            del cells[i][cols:]  #added

        item['cells'] = self._process_cells(cells)
        self.tokens.append(item)