common-workflow-language / schema_salad

Semantic Annotations for Linked Avro Data

Home Page:https://www.commonwl.org/v1.2/SchemaSalad.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve codegen validation errors

tetron opened this issue · comments

Schema salad uses the ruamel.yaml "round trip" YAML parser.

This parser preserves comments and line numbers by using ruamel.yaml.comments.CommentedMap ruamel.yaml.comments.CommentedSeq. These objects behave like Python maps/sequences, but have an additional field lc (which stands for "line column" I think), the lc contains information for both where the Map or Seq element started, as well as where each of its contained items start as well. In addition, we set our own filename field to track what file an object came from.

The purpose of Schema salad is to validate documents based against a schema. The primary user is CWL but the schema salad is intended to be general purpose.

The line/column information is used to give better validation errors, so it is possible to communicate what part of the file had an invalid value.

The existing validator can be found in validate.validate. It uses SourceLine to format errors with line/column/filename information and raises ValidationException when something is wrong. SourceLine handles all the error formatting, particularly for nested errors.

The goal of this project is to bring the parser produced by code generator up to speed with the quality of errors produced by the default interpreted validator. This means identifying the various error cases which are the same in the interpreted validator and the code generated validator and improving the code generated validator's error reporting. I recommend developing a test suite of malformed documents to see how errors are reported in various cases.