mity / md4c

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect output from parser

Wil-Ro opened this issue · comments

When running md_html() I can only seem to generate incorrect outputs. Following the code through it seems to come from the markdown parser itself rather than the html implementation.
From what I can tell text_callback is given the entire unparsed section of the text whenever it is called. I've spent a while stepping through the code and not managed to work out why so I thought I'd mention it here.

You can reproduce it using this super basic code here:

#include <stdio.h>
#include <string.h>

#include "Libs/md4c-html.h"


void SaveTextToFile(const MD_CHAR* text, MD_SIZE size, void* unneeded)
{
    printf(text);
}


int main()
{
    const char* text = "# hello world\nparagraph below\n";
    return md_html(text, strlen(text), SaveTextToFile, NULL, 0, 1);
}

The output from this code is shown below:

<h1>hello world
paragraph below
</h1>
<p>paragraph below
</p>

This happens with any markdown file I use no matter the size and it happens on Windows and Linux compiled with gcc and msvc respectively

I think you may be misunderstanding the parameters for the callback. The text parameter is generally not null-terminated. You need to use the size parameter to limit the number of bytes read from text. For example:

printf("%.*s", size, text);

I believe the reasons for this are likely twofold:

  1. Performance. Null-terminating would require MD4C to make a copy of the output string, so the extra null byte can be added at the end. Not null-terminating means the callback can often provide a pointer directly to part of the original input string, no copying necessary.

  2. It allows the original input text to contain null bytes. (However, be aware that if the input contains null bytes, using printf to print the resulting output would likely not work as expected.)

commented

Adding to what's already said

printf(text);

This is something that should never be done. printf accepts a format string and thus it's not safe to feed it arbitrary strings which might contain % for example.

Aside from causing bugs, this can also create security vulnerabilities: https://en.wikipedia.org/wiki/Uncontrolled_format_string

Ah! You're both amazing!!! Thank you! That solved the issue.
That's a really interesting output to get from that, I guess it was reading all of the memory out till it found a null terminator in my original input or something similar.