Partial/lazy parse?
svonworl opened this issue · comments
Hello! We're integrating cwljava into Dockstore, and occasionally, we need to pluck a few (or less) values out of a CWL: for example, the id
and doc
fields (if they exist) from the root workflow. In these situations, we don't care about the contents of the rest of the CWL and want to be tolerant of errors: the CWL as a whole might not be 100% well-formed, so we don't want to completely parse it and trigger a ValidationException
. We want to parse just enough of the CWL that we can extract the desired information.
Is there a way to use cwljava to perform a partial parse? In other words, to somehow prevent the irrelevant portions of the CWL from being parsed?
If that's possible, great, and if not, that would be a handy feature, we would definitely use it.
Hello @svonworl
The main feature of cwljava to is validate and load into Java objects; if we don't enforce the schema then we can't make objects. Sounds like you'd be better off using a general YAML library to pull out the desired fields. This would probably be faster as well!
Just to clarify, will cwljava gracefully ignore (or is intended to ignore) "extra" content?
I'm thinking of stuff like sbg:metadata:
http://docs.rabix.io/cwl-draft-2-extensions if they were brought into 1.2
Or the JSON-LD information that can be added to a CWL workflow?
https://sage-workflows.readthedocs.io/en/latest/sharing_discovery/cwl-and-linked-data.html
@denis-yuen It is intended to ignore namespaced extra content. For example
Which is loaded without errors in the test at
I'm going to close this issue as I think I've answered @svonworl 's request. Feel free to open another one if there are issues with parsing extension fields, thanks!