YAML front matter parser does not strip single or double quotes on scalars
JakeWharton opened this issue · comments
Steps to reproduce the problem (provide example Markdown if applicable):
---
title: 'Sixteen corners'
layout: post
---
Last year
Expected behavior:
frontMatter["title"].single() == "Sixteen corners"
Actual behavior:
Note the retained single quotes ('
) surrounding the value. This is also a problem with double quotes ("
).
YAML spec dictates behavior on unquoted, single-quoted, and double-quoted scalars: https://yaml.org/spec/1.2.2/#73-flow-scalar-styles
Yeah you're right, that's broken. The reason for that is that it currently does some manual (and very limited) YAML parsing, see YamlFrontMatterBlockParser
. We should probably never have done that, and just depend on a real YAML parser in that extension.
So two ways to fix this:
- Extend our manual parser to handle quoting
- Depend on a YAML library (which one?) instead, to be able to parse YAML 1.1 or 1.2 (which one?)
Do you have any opinions on those @JakeWharton? If we do 1, we can also consider adding support for retrieving the raw YAML source (as a single String
) to YamlFrontMatterVisitor
, for people with exotic YAML that want to parse it themselves.
Created a PR for 1., see test cases here: https://github.com/commonmark/commonmark-java/pull/261/files#diff-3bf17f3edca5ff1728b4e1918880038dcbe66f231114c2a1e40729793d07bb96R286
Yeah I ended up doing a form of the String-extraction where I simply pre-processed the input data to conditionally extract the front matter per its "specification". The upside is I can parse front matter on all files not just markdown (not sure I specifically need this). The downside is lost using this library's types as a unified model and instead have my own composite type of front matter + markdown.
I don't recall whether I'm using the 1.1 or 1.2 version of SnakeYAML. I'm under-educated on the difference.
For this specific quoting issue, you must quote values if they contain a colon in order for Jekyll's parser to correctly parse the value. Since I'm sharing front matter-containing markdown files with Jekyll as well as my tool I need to honor that. My example above doesn't have a colon, but I tend to copy/paste the last blog post when I create a new one and so the quoting has now persisted to about half my files.
Ok, so sounds like my PR will be enough to solve the issue for you?: https://github.com/commonmark/commonmark-java/pull/261/files#diff-3bf17f3edca5ff1728b4e1918880038dcbe66f231114c2a1e40729793d07bb96R286
Yep! Looks good.
Alright, released this fix as 0.19.0 🎉: https://github.com/commonmark/commonmark-java/blob/main/CHANGELOG.md#0190---2022-06-02