common-workflow-language / cwl-utils

Python utilities for CWL

Home Page:https://cwl-utils.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crash when reading ML prediction workflow

simleo opened this issue · comments

Trying to read the DeepHealth tissue/tumor prediction workflow leads to a crash:

[simleo@neuron:tmp]$ python3 -m venv venv
[simleo@neuron:tmp]$ source venv/bin/activate
(venv) [simleo@neuron:tmp]$ pip install --upgrade pip
Collecting pip
  Using cached pip-22.3-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.0.2
    Uninstalling pip-20.0.2:
      Successfully uninstalled pip-20.0.2
Successfully installed pip-22.3
(venv) [simleo@neuron:tmp]$ pip install wheel
Collecting wheel
  Using cached wheel-0.37.1-py2.py3-none-any.whl (35 kB)
Installing collected packages: wheel
Successfully installed wheel-0.37.1
(venv) [simleo@neuron:tmp]$ pip install cwl-utils
Collecting cwl-utils
  Using cached cwl_utils-0.20-py3-none-any.whl (282 kB)
Collecting packaging
  Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting requests
  Using cached requests-2.28.1-py3-none-any.whl (62 kB)
Collecting rdflib
  Using cached rdflib-6.2.0-py3-none-any.whl (500 kB)
Collecting CacheControl
  Using cached CacheControl-0.12.11-py2.py3-none-any.whl (21 kB)
Collecting cwl-upgrader>=1.2.3
  Using cached cwl_upgrader-1.2.4-py3-none-any.whl (24 kB)
Collecting schema-salad<9,>=8.3.20220825114525
  Using cached schema_salad-8.3.20221016151607-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_28_x86_64.whl (1.1 MB)
Requirement already satisfied: setuptools in ./venv/lib/python3.8/site-packages (from cwl-upgrader>=1.2.3->cwl-utils) (44.0.0)
Collecting ruamel.yaml<0.17.22,>=0.15.71
  Using cached ruamel.yaml-0.17.21-py3-none-any.whl (109 kB)
Collecting mistune<0.9,>=0.8.1
  Using cached mistune-0.8.4-py2.py3-none-any.whl (16 kB)
Collecting pyparsing
  Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
Collecting isodate
  Using cached isodate-0.6.1-py2.py3-none-any.whl (41 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2022.9.24-py3-none-any.whl (161 kB)
Collecting urllib3<1.27,>=1.21.1
  Using cached urllib3-1.26.12-py2.py3-none-any.whl (140 kB)
Collecting charset-normalizer<3,>=2
  Using cached charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Collecting idna<4,>=2.5
  Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting msgpack>=0.5.2
  Using cached msgpack-1.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (322 kB)
Collecting lockfile>=0.9
  Using cached lockfile-0.12.2-py2.py3-none-any.whl (13 kB)
Collecting ruamel.yaml.clib>=0.2.6
  Using cached ruamel.yaml.clib-0.2.6-cp38-cp38-manylinux1_x86_64.whl (570 kB)
Collecting six
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: msgpack, mistune, lockfile, urllib3, six, ruamel.yaml.clib, pyparsing, idna, charset-normalizer, certifi, ruamel.yaml, requests, packaging, isodate, rdflib, CacheControl, schema-salad, cwl-upgrader, cwl-utils
Successfully installed CacheControl-0.12.11 certifi-2022.9.24 charset-normalizer-2.1.1 cwl-upgrader-1.2.4 cwl-utils-0.20 idna-3.4 isodate-0.6.1 lockfile-0.12.2 mistune-0.8.4 msgpack-1.0.4 packaging-21.3 pyparsing-3.0.9 rdflib-6.2.0 requests-2.28.1 ruamel.yaml-0.17.21 ruamel.yaml.clib-0.2.6 schema-salad-8.3.20221016151607 six-1.16.0 urllib3-1.26.12
(venv) [simleo@neuron:tmp]$ python
Python 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
(venv) [simleo@neuron:tmp]$ python
Python 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cwl_utils.parser import load_document_by_uri
>>> wf_def = load_document_by_uri("predictions.cwl")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/venv/lib/python3.8/site-packages/cwl_utils/parser/__init__.py", line 90, in load_document_by_uri
    return load_document_by_string(doc, baseuri, loadingOptions, id_)
  File "/tmp/venv/lib/python3.8/site-packages/cwl_utils/parser/__init__.py", line 116, in load_document_by_string
    return load_document_by_yaml(result, uri, loadingOptions, id_)
  File "/tmp/venv/lib/python3.8/site-packages/cwl_utils/parser/__init__.py", line 135, in load_document_by_yaml
    result = cwl_v1_1.load_document_by_yaml(
  File "/tmp/venv/lib/python3.8/site-packages/cwl_utils/parser/cwl_v1_1.py", line 15685, in load_document_by_yaml
    result, metadata = _document_load(
  File "/tmp/venv/lib/python3.8/site-packages/cwl_utils/parser/cwl_v1_1.py", line 722, in _document_load
    loader.load(doc, baseuri, loadingOptions, docRoot=baseuri),
  File "/tmp/venv/lib/python3.8/site-packages/cwl_utils/parser/cwl_v1_1.py", line 534, in load
    raise ValidationException("", None, errors, "-")
schema_salad.exceptions.ValidationException: - tried CommandLineTool but
  Not a CommandLineTool
- tried ExpressionTool but
  Not a ExpressionTool
- tried Workflow but
  Trying 'Workflow'
    the `steps` field is not valid because:
      tried array<WorkflowStep> but
        - tried array<WorkflowStep> but
          Expected a list, was <class 'ruamel.yaml.comments.CommentedMap'>
        - tried WorkflowStep but
          Trying 'WorkflowStep'
predictions.cwl:35:5:               the `run` field is not valid because:
                                     - tried <class 'str'> but
                                       Expected a type but got CommentedMap
                                     - tried CommandLineTool but
                                       Trying 'CommandLineTool'
predictions.cwl:46:7:                     the `inputs` field is not valid because:
                                           - tried array<CommandInputParameter> but
                                             Expected a list, was <class
                                             'ruamel.yaml.comments.CommentedMap'>
                                           - tried CommandInputParameter but
                                             Trying 'CommandInputParameter'
predictions.cwl:51:11:                         the `secondaryFiles` field is not valid because:
                                                 Missing pattern in secondaryFiles specification
                                                 entry: ordereddict()
predictions.cwl:35:5:                 - tried ExpressionTool but
                                       Not a ExpressionTool
                                     - tried Workflow but
                                       Not a Workflow
- tried array<CommandLineTool | ExpressionTool | Workflow> but
  Expected a list, was <class 'dict'>

It also crashes with the packed version generated by cwltool.

Thanks @simleo

https://github.com/crs4/deephealth-pipelines/blob/0d09bc091e1f7a9778e4286a22db9e9f8d96a315/cwl/predictions.cwl#L35

We don't use YAML anchors & aliases in CWL. If you'd like to re-use a CommandLineTool in multiple locations, either use $graph or use separate files

@simleo I can fix this with common-workflow-language/schema_salad#611 (and then apply that fix in cwl-utils, we shouldn't be modifying the source as we process it anyhow