jcazevedo / moultingyaml

Scala wrapper for SnakeYAML

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Duplicate Map Entries get "squashed" to the last entry in Yaml src

rbuckland opened this issue · comments

There is unexpected behaviour when dealing with duplicate maps inside a yaml file.

foo:
  value: 1
foo:
  value: 2

will be valid, and will only have a {foo:{value:2}} map (no error).
The expected behaviour would be to have an exception // parsing error of some form.

The underlying behaviour is in snakeyaml, and it tracked and documented well here https://bitbucket.org/asomov/snakeyaml/issues/337/option-to-disallow-duplicate-keys.

This issue is needed to track the outcome of the PR to snakeyaml.

I think the parseYaml and parseYamls methods of MoultingYAML should use the default BaseConstructor of SnakeYAML if called without arguments. I'd more inclined to change the signature of parseYaml and parseYamls to something that allows customizing the SnakeYAML BaseConstructor, i.e.:

def parseYaml(baseConstructor: BaseConstructor = new Constructor()) = ???
def parseYamls(baseConstructor: BaseConstructor = new Constructor()) = ???

👍 - makes sense to change the signature

regarding which constructor (or better put - the default behaviour)

I maybe slightly green on understanding the YAML specifications but it occurs to me that "accepting" duplicate keys (in 1.1) is not good.. and in 1.2 is a MUST NOT allow duplicates.

I could be wrong, but it leads to the "default" behaviour to be

NOT_ALLOW_DUPLICATES

and error on load if they are found.
In code that I am writing for snakeyaml now ..

     * YAML 1.1 is slightly vague around duplicate entries in the YAML file.
     * The best reference is <a href="http://www.yaml.org/spec/1.1/#id862121">
     * 3.2.1.3. Nodes Comparison</a> where it hints that a duplicate map key is 
     * an error.
     * 
     * For future reference, YAML spec 1.2 is clear. The keys MUST be unique.
     * <a href="http://www.yaml.org/spec/1.2/spec.html#id2759572">1.3. Relation
     * to JSON</a>

I am about to push a PR to snakeyaml that introduces a config object to the constructor, instead of this overloaded method.

I have reworked the change in snakeyaml and then reworked the impact to my fork.
https://github.com/rbuckland/moultingyaml

  • snakeyaml now takes an optional LoadingConfig (similar to DumperOptions)

    • it added setAllowDuplicateKeys(boolean allowDuplicateKeys)
  • this loadingConfig is supplied via an implicit

      implicit class PimpedString(val string: String) extends AnyVal {
    
        def parseYaml()(implicit loadingConfig: LoadingConfig = new LoadingConfig()): YamlValue =
          convertToYamlValue(new Yaml(loadingConfig).load(string))
    
        def parseYamls()(implicit loadingConfig: LoadingConfig = new LoadingConfig()): Seq[YamlValue] =
          new Yaml(loadingConfig).loadAll(string).asScala.map(convertToYamlValue).toSeq
      }
    

    implementation is simply

      implicit val loadingConfig = new LoadingConfig()
      loadingConfig.setAllowDuplicateKeys(false)
     ... 
     ... parseYamls...
    
  • once snakeyaml PR is reviewed / approved and released, I will create the PR from my fork

If SnakeYAML is to provide a way to configure the way YAMLs are parsed, I'd rather have the configuration be done explicitly in the parseYaml(s) methods. Also, I wouldn't expose a SnakeYAML object (i.e. LoadingConfig), but rather have a MoultingYAML interface to avoid exposing the underlying implementation (the configurable print method is implemented that way).