nf-pseudopipe
Nextflow pseudopipe runner
Getting started
Data needed
- A hard masked genome file
- A non-masked genome file
- A GFF3 file with exon locations
- A protein fasta file
Preparing the data
There are a number of caveats when preparing data. Firstly, unplaced scaffolds in an assembly could lead to false positives in case the gene location was not called in the annotation pipeline. If you want to be conservative with the pseudogene predictions, subset the genome and protein files to contain chromosomes only.
Further caveats:
- Supply all files uncompressed (ppipe doesn't like gzipped data)
- The GFF file must contain features "exon" or "CDS"
- The protein fasta file should contain only primary proteins, not isoforms
Preparing the configuration
In the most simple case, just modify the nextflow.config
file with file paths
pointing to your DNA, Protein and GFF files. By default singularity
is enabled although you are free to substitute this with docker. It is not recommended to run outside of the containers on your system as things are very likely to break.
See nextflow configuration for an in-depth reference on how to tune nextflow to your computational environment
Starting the pipeline
Nextflow run main.nf
Results
The main results file will be located in results/out/pgenes/out_pgenes.txt
.