Using "--restriction_site" and "--ligation_site" outputs an error
abhratanju opened this issue · comments
Description of the bug
when we have enzymes which are not already mentioned in the nextflow config, nf-core/HiC suggests mentioning separately the --restriction_site and --ligation_site in place of --digestion. for example the new Arima version 2 is not in the nf-core/hic in built enzymes list.
Arima v2, restriction_sites: ^GATC, G^ANTC, C^TNAG, T^TAA .
ligation_site='GATCGATC,GATCANTC,GANTGATC,GANTANTC,GATCTNAG,GANTTNAG,CTNAGATC,CTNAANTC,TTAGATC,TTAANTC,TTATTNAG,GATCTAA,GANTTAA,CTNATAA,TTATAA,CTNATNAG'
But, this returns an error message. I have found a way around by adding a new enzyme "arima2" in my nextflow.config and using --digestion. But, this doesn't solve the problem in general as there can be many possibilities of new enzymes and needs to be fixed.
Command used and terminal output
No response
Relevant files
No response
System information
No response
I'm having a very similar issue:
I created a local config file and added my restriction enzyme to the list, but when I used -c to add my local config file and --digest 'myenzyme' to my list of parameters when calling nextflow, I get the same error described above and interestingly it doesn't list 'myenzyme' as one of the options:
my_nextflow.config looks like this
digest {
'myenzyme' {
restriction_site='^GATC'
ligation_site='GATCGATC'
}
'hindiii'{
restriction_site='A^AGCTT'
ligation_site='AAGCTAGCTT'
}
'mboi' {
restriction_site='^GATC'
ligation_site='GATCGATC'
}
'dpnii' {
restriction_site='^GATC'
ligation_site='GATCGATC'
}
'arima' {
restriction_site='^GATC,G^ANTC'
ligation_site='GATCGATC,GATCANTC,GANTGATC,GANTANTC'
}
}
The call looks something like this:
nextflow run nf-core/hic -r 2.0.0 -c <config.file.location>/my_nextflow.config --digestion 'myenzyme' . . .
And the error looks like this:
ERROR: Validation of pipeline parameters failed!
* --digestion: 'myenzyme' is not a valid choice (Available choices: hindiii, mboi, dpnii, arima)
If I specify --restriction_site='^GATC' --ligation_site='GATCGATC'
in my nextflow call, then I get this:
Ligation motif not found. Please either use the `--digestion` parameters or specify the `--restriction_site` and `--ligation_site`. For DNase Hi-C, please use '--dnase' option
I spent a few hours troubleshooting this and it the cause of my error was that nextflow_schem.json
hard codes the available restriction enzymes.
In a future version would it be possible to populate the digetion options in the json file with the contents of the config file?
Thanks so much for making this code available. Let me know if more information would be useful.
The issue is resolved and updated in the latest version