commonmark / commonmark-java

Java library for parsing and rendering CommonMark (Markdown)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

YAML front matter parser does not strip single or double quotes on scalars

JakeWharton opened this issue · comments

Steps to reproduce the problem (provide example Markdown if applicable):

---
title: 'Sixteen corners'
layout: post
---

Last year

Expected behavior:

frontMatter["title"].single() == "Sixteen corners"

Actual behavior:

Screen Shot 2022-05-18 at 10 56 24 PM

Note the retained single quotes (') surrounding the value. This is also a problem with double quotes (").

YAML spec dictates behavior on unquoted, single-quoted, and double-quoted scalars: https://yaml.org/spec/1.2.2/#73-flow-scalar-styles

Yeah you're right, that's broken. The reason for that is that it currently does some manual (and very limited) YAML parsing, see YamlFrontMatterBlockParser. We should probably never have done that, and just depend on a real YAML parser in that extension.

So two ways to fix this:

  1. Extend our manual parser to handle quoting
  2. Depend on a YAML library (which one?) instead, to be able to parse YAML 1.1 or 1.2 (which one?)

Do you have any opinions on those @JakeWharton? If we do 1, we can also consider adding support for retrieving the raw YAML source (as a single String) to YamlFrontMatterVisitor, for people with exotic YAML that want to parse it themselves.

Yeah I ended up doing a form of the String-extraction where I simply pre-processed the input data to conditionally extract the front matter per its "specification". The upside is I can parse front matter on all files not just markdown (not sure I specifically need this). The downside is lost using this library's types as a unified model and instead have my own composite type of front matter + markdown.

I don't recall whether I'm using the 1.1 or 1.2 version of SnakeYAML. I'm under-educated on the difference.

For this specific quoting issue, you must quote values if they contain a colon in order for Jekyll's parser to correctly parse the value. Since I'm sharing front matter-containing markdown files with Jekyll as well as my tool I need to honor that. My example above doesn't have a colon, but I tend to copy/paste the last blog post when I create a new one and so the quoting has now persisted to about half my files.

Yep! Looks good.