- This is a collection of pipelines built by 4DN-DCIC that were created and run either on the SevenBridges platform or on the 4DN platform, AWSEM.
cwl_awsem
: AWSEM CWLs in draft-3 in json formatcwl_awsem_v1
: AWSEM CWLs in v1.0 in json formatcwl_awsem_yaml
: AWSEM CWLs in draft-3 in yaml format
- Currently, 4DN DCIC uses CWL
draft-3
. - The following 4DN custom fields are added, for automated conversion from cwl to the workflow metadata used by the 4DN Data Portal.
fdn_meta
(top level field) : a dictionary that containsdata_types
,category
,workflow_type
,description
.title
: Human-friendly title of the workflow. e.g.) 'Repli-seq Processing Part A'workflow
: machine-friendly name of the workflow. e.g.) 'repliseq-parta'data_types
: an array of strings that correspond to the data types to be processed. e.g.) [ 'Repli-seq' ]category
: a string describing the steps. e.g.) 'clip + align + filter + sort + dedup + count'workflow_type
: a string describing the purpose of the workflow in short. e.g.) 'Repli-seq data processing'description
: a string describing the workflow. e.g.) 'Repli-seq data processing pipeline'
fdn_step_meta
(within eachsteps
element) : a dictionary that containssoftware_used
,description
,analysis_step_types
.software_used
: an array of strings that refer to the names and version/commit of the software used. The name must match the name and version/commit used in thedownloads.sh
in an accompanying Docker source repo. In case of commit, the first 6 character should be used. e.g.) [ 'cutadapt_1.14' ], [ 'repli-seq-pipeline_f2eb460' ]description
: a string that describes the step. e.g.) 'Adapter removal according to the Repli-seq pipeline'analysis_step_types
: an array of strings that refer to the step types (i.e. purpose). e.g.) [ 'adapter removal' ]
fdn_format
(within each top-levelinputs
andoutputs
element and in each step 'inputs' and 'outputs' element) : a string. e.g.) 'bam'fdn_output_type
(within each top-leveloutputs
element) : a string that corresponds to one of the following three - 'processed', 'QC', 'report'- processed : generic output file
- QC : output will be used to generate a quality_metric object (e.g. fastqc report)
- report : output will be used to add a metric to input (e.g. md5)
fdn_type
(in each step 'inputs' and 'outputs' element) : a string that corresponds to one of the following three - 'data file', 'reference file', 'report', 'QC', 'parameter'- 'data file' : input file and output processed file that is data dependent
- 'reference file' : input file that serves as a reference file (e.g. genome reference)
- 'QC' : same as the 'QC' category in
fdn_output_type
(output will be used to generate a quality_metric object (e.g. fastqc report)) - 'report' : same as the 'report' category in
fdn_output_type
(output will be used to add a metric to input (e.g. md5)) - 'parameter' : input or output that is not a file
fdn_cardinatlity
(in each step 'inputs' and 'outputs' element) : either 'array' or 'single', referring to whether the input/output is an array or a singlet.- 'array' : the input/output is an array
- 'single' : the input/output is not an array
fdn_secondary_file_formats
(within a top-levelinputs
andoutputs
element that contains a secondary file) : an array of strings that refer to the format name used by 4DN e.g.) ["pairs_px2"]
To run docker through CWL, you need a cwl executor - we use cwltool
(https://github.com/common-workflow-language/cwltool) to run CWL with a json/yml file describing input data. Some example input data are inside the tests/test_input_json
directory and you can see some cwltool
(=cwl-runner
) commands inside the tests/tests.sh
script.
To test cwls in this repo against the test files in tests
, use tests/tests.sh
with cwl name (without .cwl).
source tests/tests.sh bwa-mem
The Benchmark is now moved to https://github.com/SooLee/Benchmark
- Directory
cwl_awsem
is manually updated starting from freeze 0.0.2 (after Sep 1, 2017), since we no longer use SevenBridges.
-
Freeze
0.0.1
contains exported SBG CWLs and AWSEM CWL files auto-converted from SBG CWLs. The following is a description of how freeze 0.0.1 was generated.The content has been generated by the following command.
source download.sh
For this to work, you need to have SBG_TOKEN as an environmental variable on your machine.
To convert sbg cwl to awsem cwl in a batch, do the following
source convert.sh
This conversion uses script
convert_sbgcwl_to_awsemcwl.py
individually. The shell script assumes Mac (xargs -I{}
instead ofxargs -i
).