nextflow-io / nextflow

A DSL for data-driven computational pipelines

Home Page:http://nextflow.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Syntax enhancement aka DLS-2

pditommaso opened this issue · comments

This is a request for comments for the implementation of modules feature for Nextflow.

This feature allows the definition of NF processes in the main script or a separate library file, that can be invoked, one or multiple times, as any other routine passing the requested input channels as arguments.

Process definition

The syntax for the definition of a process is nearly identical to the usual one, it only requires the use of processDef instead of process and the omission of the from/into declarations. For example:

processDef index {
    tag "$transcriptome_file.simpleName"

    input:
    file transcriptome 

    output:
    file 'index' 

    script:
    """
    salmon index --threads $task.cpus -t $transcriptome -i index
    """
}

The semantic and supported features remain identical to current process. See a complete example here.

Process invocation

Once a process is defined it can be invoked like any other function in the pipeline script. For example:

transcriptome = file(params.transcriptome)
index(transcriptome)

Since the index defines an output channel its return value can be assigned to a channel variable that can be used as usual eg:

transcriptome = file(params.transcriptome)
index_ch = index(transcriptome)
index_ch.println()

If the process were producing two (or more) output channels the multiple assignment syntax can be used to get a reference to the output channels.

Process composition

The result of a process invocation can be passed to another process like any other function, eg:

processDef foo {
  input: 
    val alpha
  output: 
    val delta
    val gamma
  script:
    delta = alpha
    gamma = 'world'
    "some_command_here"
}

processDef bar {
  input:
    val xx
    val yy 
  output:
    stdout()
  script:
    "another_command_here"        
}

bar(foo('Hello'))

Process chaining

Processes can also be invoked as custom operators. For example a process foo taking one input channel can be invoked as:

ch_input1.foo()

when taking two channels as:

ch_input1.foo(ch_input2)

This allows the chaining of built-in operators and processes together eg:

Channel
    .fromFilePairs( params.reads, checkIfExists: true )
    .into { read_pairs_ch; read_pairs2_ch }

index(transcriptome_file)
    .quant(read_pairs_ch)
    .mix(fastqc(read_pairs2_ch))
    .collect()
    .multiqc(multiqc_file)

See the complete script here.

Library file

A library is just a NF script containing one or more processDef declarations. Then the library can be imported using the importLibrary statement, eg:

importLibrary 'path/to/script.nf'

Relative paths are resolved against the project baseDir variable.

Test it

You can try to the current implementation using the version 19.0.0.modules-draft2-SNAPSHOT eg.

NXF_VER=19.0.0.modules-draft2-SNAPSHOT nextflow run rnaseq-nf -r modules

Open points

  1. When a process is defined in a library file, should it be possible to access to the params values? Currently it's possible, but I think this is not a good idea because makes the library depending on the script params making it very fragile.

  2. How to pass parameters to a process defined in library files eg. For example memory and cpus settings? It could be done using config file as usual, still I expect there could be the need to parametrise the process definition and specify the parameters at invocation time.

  3. Should a namespace be used when defining the processes in library? What if two or more processes have the same name in different library files?

  4. One or many processes per library file? Currently it can be defined any number of processes, I'm starting to think that it would be better to allow the definition only of one process per file. This would simplify the reuse across different pipelines, the import in tools such as dockstore and it would make the dependencies of the pipeline more intelligible.

  5. Remote library file? Not sure it's a good idea to being able to import remote hosted files e.g. http://somewhere/script.nf. Remote paths tend to change over time.

  6. Should a versioning number be associated with the process definition? how to use or enforce it?

  7. How test process components? ideally it should be possible to include the required contained in the process definition and unit test each process independently.

  8. How chain a process retuning multiple channels?

Fantastic stuff, Paolo! I've tried it out and played with having set-based inputs and outputs and it works nicely so far. I also note that this will make unit testing individual process far easier!

My opinions on the points you raise:

  1. Imported process should be entirely isolated from other code -- i.e. no access to mutable globals like params (is workflow mutable?) -- to prevent long-range, unintended effects. However, it'd be useful to use the params global within the imported processes. Perhaps at process invocation the params variable can be set. E.g.:

     index(transcriptome_file)
         .quant(read_pairs_ch)
         .mix(fastqc(read_pairs2_ch, params: [outdir: 'my_out_dir']))
         .collect()
         .multiqc(multiqc_file)
    

    Personally, I'd always want the params object to be null unless otherwise specified, and to use params: params if I need to pass the global parameters, but perhaps a config value could specify whether it should take the global params value or null by default?

  2. I would favour the config file options being inherited from the importing workflow, and other variables set at process invocation as described for params above.

  3. Absolutely, we need different namespaces - I can imagine there being multiple processes from different packages sharing the same name. Importing each individual process would be onerous (see answer to q4 below), so namespacing will be essential. Perhaps we can declare something analogous to a package at the head of each library file, and then call package.namespace?

  4. I think it would be very burdensome to have to import each individual process separately we have many, many process and specifying them each would be tiresome and prone to error. Much better would be to have namespacing and then have users specify the namespace and process name - process names could much more easily be unique within a single namespace.

  5. I would never use remote file loading, but it is very convenient for one-off scripts. The more stable solution would be to have a package repo, or to be able to import an entire git repo's nextflow scripts. E.g.:

     importPackageFromGithub 'nf-core/nextflow-common'
    
  6. I would version code at the package level rather than script level. As with my above answers, this reduces the amount of repeated code. Therefore, within a single project/repo, the user wouldn't specify version numbers for importing individual scripts. I also wouldn't apply version numbers within scripts (again, to reduce duplication) but only at the package level.

  7. Unit testing might be out-of-scope here. However, the approach you've implemented so far means that it is easy to call individual processes with arbitrary inputs and act on outputs in any way desired. I would therefore hope to be able to write JUnit (or similar) tests for individual processes (or sets of processes) and be able to run them multiple times, with different parameters and configuration settings.

  8. I would favour having an additional parameter to the process call specifying the destination of each output channel. The first null value indicates the channel should be used in the current chain. Unhandled channels should raise an exception. E.g.:

     myProcess(inputChannelA, inputChannelB, outputs: [outputChannel1, null, OutputChannel3])
         .subscribe { println "outputChannel2: ${it}" }
    

Hello! Tried this new feature and looks amazing, thank you !

Coming to your points:

  1. I think params values should not be accessible but on the other side I'd second the idea of defining those needed params values at import time and for the current session. Without it, libraries re-usability will be hampered imo.

  2. I think the ideal would be to have something like

 index(transcriptome_file)
     .quant(read_pairs_ch, task: [cpus: 4, memory: '8 GB'])

where the task specific parameters can be defined at execution time, similarly to what could be done with params.

  1. Yes, absolutely.

  2. One process per library will definitely lead to a jungle of files to be imported/managed either locally or from a remote repository. I see the point of re-usability but it will make much more sense for the end-users to have a library which is scope specific (i.e. QC, or Salmon or even Chip-Seq or Metagenomics) and then import and combine single processes at run-time using namespaces.

  3. That could be interesting, but I will only allow few "trusted" repositories to pull from, where code is checked and verified. It could be on GitHub or under nf-core and Nextflow URLs.

  4. Only on the library itself, and versioning should be linked to a repository in my opinion. It should be something like Conda for instance, so no version specified means take the most recent version. If thinking about a Git repository, then libraries versions could be the tags (I find commit hashes cumbersome to use, but maybe it's just me).

  5. I would not enforce unit testing here but hopefully, as already stated, this new feature will provide a much simpler common ground to implement testing for both libraries and pipelines using one of the many testing libraries available in Java or Groovy.

  6. Unsure here, from one side I think Luke's idea is interesting as flagging one specific output channel to be passed to the next process is very useful. From the other side, I think processes having multiple output channels can be also branching points in the DAG and so you need to deal explicitly with the remaining output channels too, and this will break the "chain" of processes anyway.

Great stuff indeed!

In regards to point 3: I also think that namespacing will be invaluable. I really like python's semantics in this regard (import fastqc from qctools and import qctools). However, using the point as suggested earlier (for example qctools.fastqc) would conflict with chaining. Perhaps the double colon semantics could work in that case instead? (qctools::fastqc)

Conversion from a monolithic script to a slim main.nf with imported processes is perfect!! Barriers were minimal.

I would second not having access to params without passing them explicitly, but I would need some way of accessing them since many of my processes have a conditional that executes a different variant of a script depending on a param.

If it were possible to use process rather than processDef it would be cleaner but I can live with that difference. Perhaps the keyword moduleProcess would be more explicit.

First, this looks awesome. I'm working with a few people to build some pretty complex NF stuff and this type of thing should make our lives much much easier. 🎉

As for RFC:

  1. I don't think the modules should be given any access the params object. It just encourages bad habits. If the use really wants globals then they could just define them via the config file.

  2. Would it be possible to expose an api to the object / class (or create one) that actual config files get boiled down to. Then each process could work out it's config in the usual way or we could do something like this:

# define the process (assuming that param ordering under 'input:' matches the ordering used when calling
processDef assemble_minia {
    input:
    file $reads from reads
    val $prefix from prefix

    output:
    file "${prefix}asm.gfa" into gfa

    script:
    """
    minia -kmer-size 31 -in $reads -out ${prefix}asm
    """
}

And then when we use it:

# load a config file -> All values in this file override any prev set values
reads = file(params.reads)
assemble_minia.load_config("/path/to/file/or/similar")
assem_ch = assemble_minia(reads, "with_custom_config")

Or just update config values individually:

# change the container only - specifically override one value
# NOTE: accessing "params" values outside of processDef
assemble_minia.set_container("${params.docker_repository}company/minia:${params.minia_old_commit}")
old_assem_ch = assemble_minia(reads, "old_version")

assemble_minia.set_container("${params.docker_repository}company/minia:${params.minia_new_commit}")
new_assem_ch = assemble_minia(reads, "new_version")

Thanks a lot for all your comments! I've uploaded another snapshot introducing some refinements and suggestions you provided:

NXF_VER=19.0.0.modules-draft3-SNAPSHOT nextflow info 

Main changes:

  1. I've realised that adding the processDef keyword could be confusing and above not strictly necessary. In this version, when process is used in the main script, it works as usual, instead when it's used in a module definition file, it allows to define a process and therefore from/into should not be used.

  2. importLibrary as been replaced by require that's a bit more readable.

  3. Parameters. I agree with you that modules should be isolated from command line parameters. At the same time I think there should be a way to inject options to a module component when it's referenced. this would allow the parametrisation of the inner tasks. In last snapshot I've added the possibility to specify a map of values when the module is referenced via the require statement, e.g.

    require 'module-file.nf', params: [ foo: val1, bar: val2 ]
    

Then in module-file.nf we can have the usual syntax for params as in the main script:

   params.foo = 'x'
   params.bar = 'y'

   process something {
    ''' 
    your_command --here
    '''
   }
  1. Namespace. It can be useful, but I don't think it's dramatically urgent. I think we can add in a separate iteration.

  2. Remote module repository. The idea is tempting, it could work along the same line of the nextflow pull command. The module is downloaded from a Git repository and commit ID or tag can be used to identified a specific version. For example:

    require from: 'nf-core/common-lib', revision: <tag-name>
    

These are the main points. In the next iteration I will try to extend the module concept to allow the definition also of custom functions that can be imported both the in the config and script context.

Thanks for the update @pditommaso.

To clarify on injection of modules. If you wanted to inject params that has been passed as arguments to the nextflow run command would you do something like below to have default values that could be overridden by args on the nextflow run command line and then passed on to the module?

params.foo = false
params.bar = 50

require 'module-file.nf', params: [ foo: params.foo , bar: params.bar ]

Yes, exactly like that, you can even do

require 'module-file.nf', params: params

Tho both ways are the only thing that I don't like in this approach.

Of course you release this feature after I can't use nextflow anymore. Sigh. :)

I think this feature looks great. Reading through this it seems like this only lets you separate and reuse the definition of single processes, but it doesn't have a way of collecting or aggregating multiple processes into single entity (like a subworkflow). Is that right? Have you given any thought to that or is that still future work?

Regardless, I think this is awesome and I'll continue to wish I was using nextflow instead of what I'm using now...

@mes5k Ah-ah, you have to back to NF !

but it doesn't have a way of collecting or aggregating multiple processes into single entity (like a subworkflow)

This approach is extremely flexible and the idea is to use a similar mechanism also for sub-workflows.

Awesome! So happy to hear that you're working on this. Will definitely make the job of selling nextflow internally easier!

Uploaded 19.0.0.modules-draft4-SNAPSHOT that allows the definition of custom function and nested require inclusions. You can see in action in this pipeline CRG-CNAG/CalliNGS-NF@1cad86b

However still not happy, I'll try experimenting with the ability to define subworkflows.

@pditommaso does this feature relate to #238 and also #777, #844? I guess, yes.
Please, keep in mind and consider also the following features:

  • dry-run or plan to see the end graph structure;
  • print output channels(variables) if the value can be inferred and doesn't have dependencies;
  • execution of specified file, module or process to be able to run isolated part;
  • syntax checking for *.nf files;

It makes sense to allow to run a target process or module of very large script separately like a portion of work. Just look the definition of targeting for Terraform tool. It makes possible to uniquely refer to each module or any resource or data source within any provider context by full qualified item name. So, examples of CLI for NF can be written as:

nextflow run -target=process.1A_prepare_genome_samtools
nextflow run -target=module.'rnaseq.nf'.fastqc
nextflow plan -target=process.1A_prepare_genome_samtools
nextflow plan -target=module.'rnaseq.nf'.fastqc

Besides introducing the modules feature to extract common code to a separate file I hope it will lead to implementation of the described above features because they are useful and desired.

#238 yes, the others are out of the scope of this enhancement.

@pditommaso let's assume that the feature is done and can be released as an experimental.
Let's simply add an extra -enable-modules option which will enable new module feature. It will save backward compatibility and allow end users to test this feature. It's compromise when you need a new release and feed-back. For example, an experimental -XX:+UnlockExperimentalVMOptions flag for Java 11 in release notes.

That's the plan.

It is worth to add a version designation to the nf script to help end user identify version and produce clear error descriptions. For example:

apiVersion: "nextflow.io/v19.0.0-M4-modules"
   or
dslVersion: "nextflow.io/v19.0.0-M4-modules"

where M is stands for milestone.

Ok, just upload 19.0.0.modules-draft5-SNAPSHOT. Things starts to become exciting, it's not now possibile to define subworkflow either the module script or in the main script composing the defined processes e.g.

process foo {
   /your_command/ 
}

process bar {
  /another_command/
}

workflow sub1 {
  foo()
  bar()  
}

Then invoke it as a function ie. sub1. Sub-workflows can have parameter as regular function e.g.

 workflow sub1(ch_x, ch_y) {
  foo(ch_x)
  bar(ch_y)  
}

The output of the last invoked process (bar) is implicitly the output of the sub-workflow and it can be referenced in the outer scope a sub.output.

In the main script it can be defined an anonymous workflow that's supposed to be the application entry-point and therefore it's implicitly executed e.g.

fasta  = Channel.fromPath(...)
reads = Channel.fromFilePairs(...)
workflow {
  sub1( fasta, reads )
}

Bonus (big one): within a workflow scope the same channel can be used as input in different processes (finally!)

Hi @pditommaso I've started experimenting and I'm having a hard time getting something working. I'm getting this error:

[master]$ NXF_VER=19.0.0.modules-draft5-SNAPSHOT nextflow run main.nf
N E X T F L O W  ~  version 19.0.0.modules-draft5-SNAPSHOT
Launching `main.nf` [boring_kare] - revision: 66747d681c
ERROR ~ No such variable: x

 -- Check script 'main.nf' at line: 8 or see '.nextflow.log' file for more details

With this code: https://github.com/mes5k/new_school_nf

Can you point me in the right direction?

The processes can only be defined in the module script (to keep compatibility with existing code).

In the main there must be a workflow to enable the new syntax. Finally the operator like syntax was removed because I realised that was useful only on a restricted examples and generating confusing in most cases. You example should be written as:

   to_psv(to_tsv(gen_csv(ch1)))

or

gen_csv(ch1)
to_tsv(gen_csv.outout)
to_psv(to_tsv.output)

Awesome, thanks! My first example is now working.

My next experiment was to see if I could import an entire workflow. I can't tell from your comments whether that's something that's supported or whether I've just got a mistake in my code.

Is it possible to assign module process outputs to a variable so that you can do something like

modules.nf

process foo {
    input:
    file(x)

    output:
    file(y)

   script:
    .....
}

process bar {
    input:
    file(a)

    output:
    file(b)

   script:
    .....
}

main.nf

require 'modules.nf'

workflow {
  Channel
    .from('1.txt', '2.txt', '3.txt')
    .set{ ch1 }

  foo_output = foo(ch1)
  bar_output = bar(foo_output)

  bar_output.view()
}

Yes, but it's not necessary. The process can be accessed as a variable to retried the the output value ie

workflow {
  Channel
    .from('1.txt', '2.txt', '3.txt')
    .set{ ch1 }

  foo(ch1)
  bar(foo.output)
  bar.output.view()
}

My next experiment was to see if I could import an entire workflow

You can define the workflow logic as a sub-workflow, then invoke it ie.

workflow my_pipeline(ch) {
  gen_csv(ch)
  to_tsv(gen_csv.outout)
  to_psv(to_tsv.output)
}

workflow {
  ch1 = Channel.fromPath(params.something)
  my_pipeline( ch1 )
}

OK cool thanks. Also you mentioned that you can reuse a channel. Can you therefore do

workflow {
  Channel
    .from('1.txt', '2.txt', '3.txt')
    .set{ ch1 }

  foo(ch1)
  bar(foo.output)
  baz(foo.output)

  bar.output.view()
  baz.output.view()
}

I'm also playing with this idea:

process foo {
   /your_command/ 
}

process bar {
  /another_command/
}

workflow sub1 {
  Channel.from(something) | foo | bar | view()
}

remind you something ? 😆😆

Also you mentioned that you can reuse a channel. Can you therefore do

YES!

Was gonna suggest considering a pipe operator! Railway oriented programming demonstrates this nicely. I think it's worth thinking about chaining processes vs. chaining operators and how the two might mix and match.

Oh! Railway oriented programming .. didn't know! The | as pipe operator it would be nice because everybody knows the meaning in Bash. Tho in some context it's used to express the parallel execution, it could be even done something like:

channel >> foo >> (bar | baz) 

Where >> means pipe instead | the parallel execution ..

Yup, and with a pipe operator you're getting dangerously close to monads and bind (>>=) operators. I think is fantastic and am super excited about it, but I also think its worth taking a long time to think about this because it's worth getting right.

another couple of years :D

Let's say we have a parallel execution as (foo | bar | baz), what is supposed to be the resulting output? an array of three channels corresponding the respective processes ?

Although I see the expressiveness of

workflow {
  channel >> foo >> (bar | baz)
}

I fear that this will end up like some of perl one liners that are knocking around - powerful yet nearly impossible to decode

I prefer

workflow {
  foo(channel)
  bar(foo.output)
  baz(foo.output)
}

IMO that will be easier to read than a long chain of processes, particularly for workflows with many processes.

If there was an option for a chaining/piping syntax such as (foo | bar | baz) I would prefer that we access the outputs explicitly as a dictionary with keys foo, bar and baz

Yes, I agree that potentially it could become a too cryptic syntax, but it's worth to experiment with it. It could also be very expressive.

If there was an option for a chaining/piping syntax such as (foo | bar | baz) I would prefer that we access the outputs explicitly as a dictionary with keys foo, var and baz

Actually there *is* and being a mere invocation of the process it's always possible to access the process output as foo.output, etc.

I've uploaded a new iteration 19.0.0.modules-draft6-SNAPSHOT. The most important change is since now on modules feature needs to be activated adding the following statement at the beginning of the script.

nextflow.enable.modules = true

This allows to declare process modules also in the main script without breaking existing code.

It also implements a very experimental pipe operators as sketched above eg.

 workflow {
  channel >> foo >> (bar | baz)
}

I've also started to draft the documentation to help you to evaluate this feature. You can find at this link.

What's next

  • Namespace: now that's possible to define process components in the main script, I'm more convinced that the ones defined in a separate script should be included and referenced using a separate namespace.

  • Workflow inputs/outputs definition: I'm still not so happy with the current function-like schema for (sub) workflow inputs definition. Also there's no definition for output.

  • Extends the support for pipes syntax to channel operators ie. map, collect, etc

I love the pipe idea, but wouldn't it be better to use |> like it already exists in Elixir and currently in proposal stage for javascript?

>> is bitwise shift in a lot of language (and redirection in bash 😛).

We are restricted to the operators provided by groovy http://groovy-lang.org/operators.html#Operator-Overloading

channel | foo | (bar & baz)

sounds better? 😂

I didn't know you couldn't define your own operator in groovy, my bad!

Then I really have no opinion in the matter, >> or | would do the job fine I guess 😉

We are restricted to the operators provided by groovy http://groovy-lang.org/operators.html#Operator-Overloading

channel | foo | (bar & baz)

sounds better? 😂

I certainly prefer | to >> only because of the common usage in bash and IMO bar & baz is a bit more intuitive

I probably prefer | too given the operator overloading restriction. However, as I think about it I'm having trouble figuring out why we can't treat processes as special operators. A process just transforms data on a channel, it just happens in a separate thread, whereas operators run in the main thread.

@pditommaso I know that you removed the ability to treat these processes as operators, but can you explain the logic behind that? It seems that something like:

channel.foo().into{ x, y }
x.bar()
y.baz()

Would be a little closer to how things have happened in nextflow in the past. Just trying to wrap my head around things!

Mostly because the syntax would clash with namespace declaration. In the next iteration a process foo declared in the module file x need to be invoked as x.foo(). Therefore would not make much sense the syntax ..

But here is here it comes the pipe operator e.g .

channel | x.foo | ( x.bar & x.baz )

Would Groovy allow you to use :: as the namespace separator? We could pretend we're C++ programmers.

LOL. Currently no, but likely it will in the future as :: is now also a Java operator. However | and & are a good compromise IMO, as NF targets more Bash programmers than C++ ones 😉

Cool, let us know when there's a build with namespaces available so we can experiment!

Just uploaded a new snapshot 19.0.0.modules-draft7-SNAPSHOT. I think we are approaching stable implementation. The most notable thing is the ability to define module namespace. Also require has been replaced by include to be consistent with the exiting includeConfig.

Also there's a preliminary implementation for processes piping operator. More details in the docs page.

One question related to this emerging feature. One idea of this e.g. in the nf-core project would be that we build up a "module library" that can be used and shared by all projects in general, thus making all these small pipeline modules available to the wider community. This would make fixing issues much easier, however I see some issues/potential for issues arise as well:

What happens when we e.g. define a fastqc module, but this is changed in an upstream module?

Are there "interfaces" implemented that define how the module has to look like and then producing an error message when this doesn't fit (anymore?). How would one test this when updating the module? Might be that this is already there and/or I'm missing something....

Regarding this, the idea is to add the ability to import modules from Git repos providing the commit id or tag name. This should solve the problem about changing versions.

I wonder if it's a good idea to release a collection of "module scripts" in phases like Bioconductor, or whether it's easier to commit and reference each module separately. The former approach means you would have to provide a single commit/release id for the entire pipeline. The latter approach is more flexible but will require a bit more tracking.

I think we can have both. It would be enough to organise the repo as a collection of scripts, then NF could include one or many (with the same version id)

I agree - would be possible both :-)

I'm struggling with how much package management is needed. If the nextflow files explicitly specify git repos with tags/hashes in the include statements, then do we really need an additional package manager? Nextflow already has support for cloning and tracking repos. Is there an advantage to tracking files and versions using a separate mechanism?

Tend to agree. I would exclude any dependency with an external tool/package manager.

Cromwell just lets you import from a URL, which supports referencing specific GitHub commits. Probably no reason to over-engineer beyond that functionality.

Allowing deps from plain URL is just evil because it can change or break at any time.

Regarding the modules inclusion with namespace, I've made some tests but I'm not convinced by this solution. The problems are:

  1. it makes the implementation much more complex for a feature that most people won't use (read as over-engineering)
  2. make the NF more verbose because you need to prefix each imported process or workflow with the module name .. boring
  3. break the configuration for processes whose config is defined by the name, because the name is supposed to be module_name.process_name.

For these reasons I think to use ad different model that allows the inclusion of specific processes from a module file which it would be optionally possible to give an alias for example:

include FOO from 'module/path'
include FOO from 'module/path' as BAR 

The first imports the FOO process in the current script from the specified module file, the second syntax import the same process but it will be referenced as BAR.

include 'module/path'

The above syntax allows the import of all components defined in the module.

Thoughts ?

@minillinim my gut reaction is yeah, you're on the hook for updating versions if you need a global refresh like that. I also think it's a bit unlikely scenario, but OK, we can use it as an example. Can you articulate how you'd like it work? Maybe consider opening a new feature request so that the conversation can be tracked separately.

Just registering my genuine interest here. The modules feature creates an abstraction layer; this can be very convenient (re-use) but raises the question of scope, as perhaps indicated by the discussions relating to the hooks for params, channels, config settings (and directives such as when and publishDir). If one were to add a few of these into this framework, then it gets very close to the actual process definition itself - with only the script section missing. What I like a lot about NF is its expressiveness; a NF process is already very neatly and tidily packaged in a declarative way with not much ballast. In some processes we may also have parameters that will lead to different script settings (i.e. set/unset/change a command line parameter). I have in-depth experience with just a single pipeline, so my perspective is probably very limited. From this perspective, the module feature will be useful for pervasive tasks (say fastqc) that need very little context. Could the when directive be incorporated into this feature?

These are good points. However it may not be clear since the syntax changed other the time, the new process syntax does not impose the use of a separate module file.

You will still be able to use a single script approach, in which processes will be declared and then invoked. eg.

process foo {
  input:
  file x 
  script:
  """
  your_command --in $x
  """
} 

Channe.fromPath('/some/data/*') | foo()

The when won't change, but I agree with you that the use of some directive such as when and publishDir can be problematic when using separate module files and even not suggested any more.

If we shouldn't be using publishDir in modules (and I can understand why), then perhaps we should add a publishDir operator that does the same thing?

Yes, was thinking something along these lines. Tho I'm starting to think the (sub)workflow should have it's own directives to declare inputs/outpus/publish etc. This remains an open point.

I'm just going to chime in here and say that I would love to see a publishDir operator.

With a publishDir operator, you'd need to expose file naming conventions used by the process I assume, or introduce a declarative layer describing its outputs that internally maps to file name globs.

Tried modules with draft7 - it's awesome, built a whole pipeline with them, flawlessly (hopefully specs won't change much now!! :-P). Only noticed that errors tend to be more cryptic than previously.

A few questions (some asked above):

  • how to publish final results?
  • is when going to be dropped entirely or replaced with an equivalent construct?
  • can you call the same process from a module twice? (e.g. by importing twice the same module with different aliases)

Finally, I noticed that if one imports like this:

include 'modules/fastqc.nf' as fastqc

and then in a workflow invokes like this:

fastqc.fastqc(reads)

Then the output is available from fastqc.output (not fastqc.fastqc.output).

Also, how to define a nextflow.config file with profiles affecting e.g. cpu allocations to tasks that may be imported?

Tried modules with draft7

The lastest is draft10 and actually change quite a bit. Have look here

how to publish final results?

This remains an open point to be decided

is when going to be dropped entirely or replaced with an equivalent construct?

For now, it won't change, but likely it will be less useful. The use if should cover use cases

can you call the same process from a module twice? (e.g. by importing twice the same module with different aliases)

Yes, you wont need to import with a different alias. Each invocation returns it's own instance that can be safely accessed.

Regarding the last point the include syntax change also to avoid to mess-up the process naming in the config file. See here

What is the version that we should use to try the draft10?
I tried 19.0.0.modules-draft10-SNAPSHOT, and it doesn't seems to work.

Unfortunately didn't upload this version. You need to check modules-draft10 branch and compile it.

Playing with modules-draft10 I notice that this code fails:

some_process(in_chan) 
     | map { x -> x*2 }
     | some_other_process

While this code works:

some_process(in_chan) | map { x -> x*2 } | some_other_process

as does this:

some_process(in_chan) \
     | map { x -> x*2 } \
     | some_other_process

It would be nice if | could handle newlines.

I'm also noticing that some errors don't propagate to the UI. For example this code

my_process(in_chan) \
    | another_proc \
    | set { out_chan }

Results in an exception to .nextflow.log, but nothing to the user. The pipeline prints nothing and then exits. I've seen this a bunch as I'm trying to figure out the syntax.

@mes5k Not sure if it helps, but with draft7 the following (with pipes before newline breaks, rather than after) worked:

Channel.fromPath("input/*.fastq.gz") |
    filter{ it.exists() } | 
    ifEmpty { error "No reads found!" }

@pditommaso Thanks for the update, luckily draft7 is not so different from draft10. One follow up question though: if you don't alias two instances (or two imports) of the same process, how do you handle output using the dot notation? E.g. if I call fastqc twice, how does Nextflow know which fastqc.output I am referring to later?

It would be nice if | could handle newlines.

I have little control over the lexer level of the parser, unfortunately.

I'm also noticing that some errors don't propagate to the UI. For example this code

It would be very useful to have a replicate snippet and the resulting log file

if you don't alias two instances (or two imports) of the same process, how do you handle output using the dot notation?

Well, you are supposed to use access the output before the following invocation.

final @maxulysse I've uploaded 19.0.0.modules-draft10-SNAPSHOT finally.

Well, you are supposed to use access the output before the following invocation

There are cases where this might not be possible. In that case, would importing twice under different aliases work? If not, could aliasing at invokation time be supported?

There are cases where this might not be possible.

Don't forget you can also assign a process output to a variable. The processName.outout is supposed to be a shortcut to avoid variables proliferation.

Good point, sounds like this is the best way to achieve that as it is truly unambiguous. Thanks @pditommaso !

@pditommaso assigning the processName.output to variables doesn't work for me with draft10.

Tried the following:

fastqc(reads | flatMap { x -> x[2] })
fastqc_raw_output = fastqc.output

// more stuff here

fastqc(trimmomatic.output[0] | flatMap { x -> x[2] })
fastqc_trimmed_output = fastqc.output

as well as variations, such as direct assignment in only one line rather than two, with or without .output appended. One call to fastqc works fine, two calls result in an error elsewhere in the code, completely unrelated.

Like @mes5k I am also finding errors in .nextflow.log that do not propagate to the UI, such as:

Apr-09 10:42:10.058 [main] DEBUG nextflow.Session - Session aborted -- Cause: No signature of method: groovyx.gpars.dataflow.DataflowBroadcast.getAt() is applicable for argument types: (Integer) values: [0]

when using draft10

@rspreafico-vir I need some hints on the issue, what's the code causing the error and the complete stack trace.

Thanks @pditommaso ! Here is the stack trace prior to the error posted in my previous comment:

Apr-09 10:41:55.373 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 19.0.0.modules-draft10-SNAPSHOT
Apr-09 10:41:55.392 [main] INFO  nextflow.cli.CmdRun - Launching `main.nf` [thirsty_gilbert] - revision: a765aedefd
Apr-09 10:41:55.405 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /Users/rspreafico/workspace/nf-rnaseq/nextflow.config
Apr-09 10:41:55.406 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /Users/rspreafico/workspace/nf-rnaseq/nextflow.config
Apr-09 10:41:55.474 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Apr-09 10:41:56.096 [main] DEBUG nextflow.Session - Session uuid: b38b6eb5-e329-48e5-bfab-e110d505470b
Apr-09 10:41:56.096 [main] DEBUG nextflow.Session - Run name: thirsty_gilbert
Apr-09 10:41:56.097 [main] DEBUG nextflow.Session - Executor pool size: 4
Apr-09 10:41:56.121 [main] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
Apr-09 10:41:56.127 [main] DEBUG nextflow.Global - Using AWS credentials defined in nextflow config file
Apr-09 10:41:56.128 [main] DEBUG nextflow.file.FileHelper - AWS S3 config details: {secret_key=REDACTED, region=us-west-2, access_key=REDACTED}
Apr-09 10:42:06.686 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 19.0.0.modules-draft10-SNAPSHOT build 5059
  Modified: 01-04-2019 22:59 UTC (15:59 PDT)
  System: Mac OS X 10.14.4
  Runtime: Groovy 2.5.6 on OpenJDK 64-Bit Server VM 1.8.0_152-release-1056-b12
  Encoding: UTF-8 (UTF-8)
  Process: 44666@rspreafico-vir.local [10.184.235.215]
  CPUs: 4 - Mem: 16 GB (191.2 MB) - Swap: 3 GB (768.8 MB)
Apr-09 10:42:07.097 [main] DEBUG nextflow.Session - Work-dir: s3://vir-nf-batch/work [Mac OS X]
Apr-09 10:42:07.097 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /Users/rspreafico/workspace/nf-rnaseq/bin
Apr-09 10:42:07.218 [main] DEBUG nextflow.Session - Session start invoked
Apr-09 10:42:07.222 [main] DEBUG nextflow.trace.TraceFileObserver - Flow starting -- trace file: /Users/rspreafico/workspace/nf-rnaseq/trace.tsv
Apr-09 10:42:07.498 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Apr-09 10:42:07.505 [main] WARN  nextflow.NextflowMeta$Preview - DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
Apr-09 10:42:07.665 [main] WARN  nextflow.NextflowMeta$Preview - DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
Apr-09 10:42:07.747 [main] WARN  nextflow.NextflowMeta$Preview - DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
Apr-09 10:42:07.856 [main] WARN  nextflow.NextflowMeta$Preview - DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
Apr-09 10:42:07.964 [main] WARN  nextflow.NextflowMeta$Preview - DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
Apr-09 10:42:08.066 [main] WARN  nextflow.NextflowMeta$Preview - DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
Apr-09 10:42:08.136 [main] WARN  nextflow.NextflowMeta$Preview - DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
Apr-09 10:42:08.232 [main] WARN  nextflow.NextflowMeta$Preview - DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
Apr-09 10:42:08.299 [main] WARN  nextflow.NextflowMeta$Preview - DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
Apr-09 10:42:08.447 [main] WARN  nextflow.NextflowMeta$Preview - DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
Apr-09 10:42:08.540 [main] WARN  nextflow.NextflowMeta$Preview - DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
Apr-09 10:42:09.830 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:09.830 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:09.835 [main] INFO  nextflow.executor.Executor - [warm up] executor > awsbatch
Apr-09 10:42:09.850 [main] DEBUG nextflow.util.ThrottlingExecutor - Creating throttling executor with opts: nextflow.util.ThrottlingExecutor$Options(poolName:AWSBatch-executor, limiter:RateLimiter[stableRate=50.0qps], poolSize:20, maxPoolSize:20, queueSize:5000, maxRetries:10, keepAlive:1m, autoThrottle:true, errorBurstDelay:1s, rampUpInterval:100, rampUpFactor:1.2, rampUpMaxRate:1.7976931348623157E308, backOffFactor:2.0, backOffMinRate:0.0166666667, retryDelay:1s)
Apr-09 10:42:09.856 [main] DEBUG nextflow.util.ThrottlingExecutor - Creating throttling executor with opts: nextflow.util.ThrottlingExecutor$Options(poolName:AWSBatch-reaper, limiter:RateLimiter[stableRate=50.0qps], poolSize:20, maxPoolSize:20, queueSize:5000, maxRetries:10, keepAlive:1m, autoThrottle:true, errorBurstDelay:1s, rampUpInterval:100, rampUpFactor:1.2, rampUpMaxRate:1.7976931348623157E308, backOffFactor:2.0, backOffMinRate:0.0166666667, retryDelay:1s)
Apr-09 10:42:09.856 [main] DEBUG n.cloud.aws.batch.AwsBatchExecutor - Creating parallel monitor for executor 'awsbatch' > pollInterval=10s; dumpInterval=5m
Apr-09 10:42:09.859 [main] DEBUG n.processor.TaskPollingMonitor - >>> barrier register (monitor: awsbatch)
Apr-09 10:42:09.878 [main] DEBUG nextflow.Global - Using AWS credentials defined in nextflow config file
Apr-09 10:42:09.926 [main] DEBUG nextflow.Session - >>> barrier register (process: gtf2genePred)
Apr-09 10:42:09.928 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > gtf2genePred -- maxForks: 4
Apr-09 10:42:09.948 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:09.948 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:09.949 [main] DEBUG nextflow.Session - >>> barrier register (process: genePred2bed)
Apr-09 10:42:09.949 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > genePred2bed -- maxForks: 4
Apr-09 10:42:09.952 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:09.952 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:09.952 [main] DEBUG nextflow.Session - >>> barrier register (process: gtf2refFlat)
Apr-09 10:42:09.953 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > gtf2refFlat -- maxForks: 4
Apr-09 10:42:09.958 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:09.958 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:09.959 [main] DEBUG nextflow.Session - >>> barrier register (process: fasta2chromSizes)
Apr-09 10:42:09.959 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > fasta2chromSizes -- maxForks: 4
Apr-09 10:42:09.964 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:09.964 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:09.965 [main] DEBUG nextflow.Session - >>> barrier register (process: gtf2intervalList)
Apr-09 10:42:09.965 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > gtf2intervalList -- maxForks: 4
Apr-09 10:42:09.975 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:09.975 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:09.976 [main] DEBUG nextflow.Session - >>> barrier register (process: fastqc)
Apr-09 10:42:09.977 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > fastqc -- maxForks: 4
Apr-09 10:42:09.992 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:09.992 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:09.993 [main] DEBUG nextflow.Session - >>> barrier register (process: trimmomatic)
Apr-09 10:42:09.994 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > trimmomatic -- maxForks: 4
Apr-09 10:42:10.005 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:10.005 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:10.006 [main] DEBUG nextflow.Session - >>> barrier register (process: salmon_index)
Apr-09 10:42:10.007 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > salmon_index -- maxForks: 1
Apr-09 10:42:10.014 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:10.015 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:10.016 [main] DEBUG nextflow.Session - >>> barrier register (process: salmon_quant)
Apr-09 10:42:10.016 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > salmon_quant -- maxForks: 8
Apr-09 10:42:10.020 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:10.020 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:10.021 [main] DEBUG nextflow.Session - >>> barrier register (process: star_index)
Apr-09 10:42:10.021 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > star_index -- maxForks: 1
Apr-09 10:42:10.028 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:10.028 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:10.028 [main] DEBUG nextflow.Session - >>> barrier register (process: star_align)
Apr-09 10:42:10.028 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > star_align -- maxForks: 1
Apr-09 10:42:10.033 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:10.033 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:10.034 [main] DEBUG nextflow.Session - >>> barrier register (process: picard_markduplicates)
Apr-09 10:42:10.034 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > picard_markduplicates -- maxForks: 2
Apr-09 10:42:10.039 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:10.039 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:10.040 [main] DEBUG nextflow.Session - >>> barrier register (process: samtools_index)
Apr-09 10:42:10.040 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > samtools_index -- maxForks: 4
Apr-09 10:42:10.043 [main] WARN  nextflow.extension.OperatorEx - The operator `first` is useless when applied to a value channel which returns a single value by definition
Apr-09 10:42:10.046 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:10.047 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:10.047 [main] DEBUG nextflow.Session - >>> barrier register (process: wig2bigwig)
Apr-09 10:42:10.047 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > wig2bigwig -- maxForks: 4
Apr-09 10:42:10.051 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: awsbatch
Apr-09 10:42:10.052 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'awsbatch'
Apr-09 10:42:10.052 [main] DEBUG nextflow.Session - >>> barrier register (process: dupradar)
Apr-09 10:42:10.052 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > dupradar -- maxForks: 4
Apr-09 10:42:10.058 [main] DEBUG nextflow.Session - Session aborted -- Cause: No signature of method: groovyx.gpars.dataflow.DataflowBroadcast.getAt() is applicable for argument types: (Integer) values: [0]
Possible solutions: getAt(java.lang.String), putAt(java.lang.String, java.lang.Object), set(groovy.lang.Closure), wait(), grep(), tap(groovy.lang.Closure)

As for the code causing the issue, not sure since the UI does not report the error, and there is indication of the line number or code snippets potentially causing the problem. I do have a bunch of calls like this that seem related to the error message:

picard_markduplicates(star_align.output[0])
samtools_index(picard_markduplicates.output[0])

but I am puzzled because the same pipeline with the same .output[0] syntax worked before...

Ok found the issue, there was a missing .output. So the log got it right, however the issue is that no error was printed on the console.

Tried to call fastqc twice using draft10. Works fine with just first call, but not with two calls.

First attempt:

fastqc(reads | flatMap { x -> x[2] })
fastqc_raw_output = fastqc.output

trimmomatic(reads)
 
fastqc(trimmomatic.output[0] | flatMap { x -> x[2] })
fastqc_trimmed_output = fastqc.output 

Nextflow exits without printing errors to the console, but .nextflow.log reports:

Apr-17 16:42:17.226 [main] DEBUG nextflow.Session - Session aborted -- Cause: Channel `fastqc_raw_output` has been used twice as an output by process `fastqc` and process `fastqc`

Second attempt:

fastqc_raw_output = fastqc(reads | flatMap { x -> x[2] }).output
trimmomatic(reads)
fastqc_trimmed_output = fastqc(trimmomatic.output[0] | flatMap { x -> x[2] }).output

No error on console by Nextflow, but exits. .nextflow.log reads:

Apr-17 16:44:59.654 [main] DEBUG nextflow.Session - Session aborted -- Cause: No such property: output for class: groovyx.gpars.dataflow.DataflowBroadcast

Third attempt:

fastqc_raw_output = fastqc(reads | flatMap { x -> x[2] })
trimmomatic(reads)
fastqc_trimmed_output = fastqc(trimmomatic.output[0] | flatMap { x -> x[2] })

Got an unrelated error from Nextflow console. .nextflow.log reads:

Apr-17 16:52:26.144 [main] DEBUG nextflow.Session - Session aborted -- Cause: Channel `fastqc_raw_output` has been used twice as an output by process `fastqc` and process `fastqc`

(fastqc_raw_output and fastqc_trimmed_output are passed to MultiQC, but only once each)

What is the right way to call the same module twice and store the output from each call?

Next will merge on master and start to debug this

Looking forward! Other than somehow cryptic error messages (or no error messages at all on the console) and inability to call the same module twice, it has been working like a charm, locally and in AWS Batch, super-excited about this. Also looking forward to the publishDir operator ;-)

I'm quite impressed on the expressiveness of the syntax you managed to put together!

It's all thanks to DSL-2! ;-)

The missing should be fixed now. Instead, I was too optimistic regarding the multiple invocations of the same problem. There are still some problems with the name conflicting in legacy structures. For now, it's only possible including the same process with a different alias ie. include x as y from z.

These changes have been committed to the master branch. I'm closing this issue because it's becoming too complicated to follow.

I'll open other issues to follow up on specific enhancements. If you find any error/malfunctions please report as a separate issue including the .nextflow.log file and a snippet to replicate the problem.

If the aliasing strategy works, that is perfect for me. Thanks for addressing it!

The aliasing should be supported by draft10 already, correct? 'cause I am trying it but turning this

include fastqc from 'modules/fastqc'

into this

include fastqc as fastqc_raw from 'modules/fastqc'

produces the following error

ERROR ~ Unexpected error [NullPointerException]

 -- Check script 'main.nf' at line: 6 or see '.nextflow.log' file for more details

@rspreafico-vir You need to clone and to build the master branch or use the 19.05.0-SNAPSHOT version.

This works great with 19.05.0-SNAPSHOT. Thank you!!

Is there any current plan for when this might be officially released?

@pditommaso Is this available on the 19.04.1 release?

Nope, kindly see a few comments up. Requires 19.05.0-SNAPSHOT

Thanks @rspreafico-vir. I am on the point of submitting something to nf-core and would dearly love it to be using DSL-2!

DSL-2 will be great for nf-core! It is easy to envision a carefully crafted library of modules, one per tool, in nf-core. In addition to being great for nf-core pipelines, such nf-core modules would be useful per se for end users.

@aunderwo Finally! I remember talking to you about this at the NF conference last year 😄 Really looking forward to this functionality being added to nf-core. Be nice to create a standardised set of modules for the community.

Should we re-open this issue ? Very excited about the new release!