nf-pseudopipe

Nextflow pseudopipe runner

Getting started

Data needed

A hard masked genome file
A non-masked genome file
A GFF3 file with exon locations
A protein fasta file

Preparing the data

There are a number of caveats when preparing data. Firstly, unplaced scaffolds in an assembly could lead to false positives in case the gene location was not called in the annotation pipeline. If you want to be conservative with the pseudogene predictions, subset the genome and protein files to contain chromosomes only.

Further caveats:

Supply all files uncompressed (ppipe doesn't like gzipped data)
The GFF file must contain features "exon" or "CDS"
The protein fasta file should contain only primary proteins, not isoforms

Preparing the configuration

In the most simple case, just modify the nextflow.config file with file paths pointing to your DNA, Protein and GFF files. By default singularity is enabled although you are free to substitute this with docker. It is not recommended to run outside of the containers on your system as things are very likely to break.

See nextflow configuration for an in-depth reference on how to tune nextflow to your computational environment

Starting the pipeline

Nextflow run main.nf

Results

The main results file will be located in results/out/pgenes/out_pgenes.txt.

About

Nextflow pipeline to run pseudopipe

Languages

Language:Nextflow 51.1%Language:Python 25.3%Language:Makefile 18.1%Language:Shell 5.5%