common-workflow-language / cwljava

Java SDK for the Common Workflow Language standards

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Partial/lazy parse?

svonworl opened this issue · comments

Hello! We're integrating cwljava into Dockstore, and occasionally, we need to pluck a few (or less) values out of a CWL: for example, the id and doc fields (if they exist) from the root workflow. In these situations, we don't care about the contents of the rest of the CWL and want to be tolerant of errors: the CWL as a whole might not be 100% well-formed, so we don't want to completely parse it and trigger a ValidationException. We want to parse just enough of the CWL that we can extract the desired information.

Is there a way to use cwljava to perform a partial parse? In other words, to somehow prevent the irrelevant portions of the CWL from being parsed?

If that's possible, great, and if not, that would be a handy feature, we would definitely use it.

Hello @svonworl

The main feature of cwljava to is validate and load into Java objects; if we don't enforce the schema then we can't make objects. Sounds like you'd be better off using a general YAML library to pull out the desired fields. This would probably be faster as well!

Just to clarify, will cwljava gracefully ignore (or is intended to ignore) "extra" content?

I'm thinking of stuff like sbg:metadata: http://docs.rabix.io/cwl-draft-2-extensions if they were brought into 1.2
Or the JSON-LD information that can be added to a CWL workflow?
https://sage-workflows.readthedocs.io/en/latest/sharing_discovery/cwl-and-linked-data.html

@denis-yuen It is intended to ignore namespaced extra content. For example

Which is loaded without errors in the test at

public void testvalid_metadataByPath() throws Exception {

I'm going to close this issue as I think I've answered @svonworl 's request. Feel free to open another one if there are issues with parsing extension fields, thanks!