nomis_pipeline

About

Repository containing workflows for IMP3 downstream analyses
Related project(s): NOMIS

Setup

Conda

# install miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod u+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh # follow the instructions

Getting the repository including sub-modules

git clone --recurse-submodules ssh://git@git-r3lab-server.uni.lu:8022/susheel.busi/nomis_pipeline.git

Create the main snakemake environment

# create venv
conda env create -f requirements.yml -n "snakemake"

Dependencies

The successful completion requires tools created by others

Notes:

Dependencies are included as submodules where possible
However, installation issues may persist
If so, check the respective repositories listed

How to run

The workflow can be launched using one of the option as follows

./config/sbatch.sh

(or)

CORES=48 snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --conda-prefix ${CONDA_PREFIX}/pipeline --cores $CORES -rpn

(or)

Note: For running on esb-compute-01 or litcrit adjust the CORES as needed to prevent MANTIS from spawning too many workers and launch as below

CORES=24 snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --conda-prefix ${CONDA_PREFIX}/pipeline --cores $CORES -rpn

Configs

All config files are stored in the folder config/:

Workflows

imp workflow: setup the folders required for running IMP3 on each sample
viruses workflow: run VIBRANT and vCONTACT2 on assemblies, including CheckV on vibrant_output
eukaryotes workflow: running EUKUlele on assemblies
bins workflow: collects all bins together for taxonomy analyses
taxonomy workflow: run GTDBtk and CheckM on the bins
functions workflow: runs METABOLIC, MAGICCAVE and FUNCS analyses
mantis workflow: runs MANTIS on the bins explicitly
euk_bin workflow: performs coassembly specifically for eukaryotes (EukRep) and runs binning with CONCOCT
coassembly_binning: performs coassembly for all samples and subsequent binning
misc workflow: runs gRodon, antismash. To be implemented PopCOGent,and potentially anvi'o coassembly/binning.

Relevant paremters which have to be changed are listed for each workflow and config file. Parameters defining system-relevant settings are not listed but should be also be changed if required, e.g. number of threads used by certain tools etc.

STEPS

The workflow is setup in multiple steps. Prior to running change the following

config: config/
- config.yaml:
  - change steps

Options:

imp
viruses
eukaryotes
bins
taxonomy
functions
mantis
euk_bin
coassembly_binning
misc

IMPORTANT NOTE: only the imp step should be run first, followed by launching IMP3 outside of this pipeline. Subsequent other STEPS can be run

Launching `IMP3`

Per-sample IMP3 can be launched as follows:

chmod -R 775 ${SAMPLE}      # adding permissions
cd ${SAMPLE}
sbatch ./launchIMP.sh       # on IRIS

imp workflow

Download raw data required for the analysis.

config: config/
- config.yaml:
  - change work_dir
- sbatch.sh
  - change SMK_ENV
  - if not using slurm to submit jobs remove --cluster-config, --cluster from the snakemake CMD
- slurm.yaml (only relevant if using slurm for job submission)
workflow: workflow/

Prior to running the imp workflow make the following adjustments.

IMP_config.yaml:
- workflow/notes/IMP_config.yaml
  - change Metagenomics
run_threads:
- workflow/notes/runIMP.sh
  - change: threads
launch_threads:
- workflow/notes/runIMP.sh
  - change: -n8

IMPORTANT Note:

This above workflow should be run first, followed by launching IMP3 outside of this pipeline and then subsequent STEPS can be run

Main workflow

Main analysis workflow: given SR FASTQ files, run all the steps to generate required output. This includes:

setting up folders for IMP
viral and eukaryotic annotations
functional analyses and
taxonomic analyses (optional)

The workflow is run per sample and might require a couple of days to run depending on the sample, used configuration and available computational resources. Note that the workflow will create additional output files not necessarily required to re-create the figures shown in the manuscript.

config:
- per sample
- config/<sample>/config.yaml
  - change all path parameters (not all databases are required, see above)
- config/<sample>/sbatch.yaml
  - change SMK_ENV
  - if not using slurm to submit jobs remove --cluster-config, --cluster from the snakemake CMD
- config/<sample>/slurm.yaml (only relevant if using slurm for job submission)
workflow: workflow/

Report workflow (2021-05-26 15:54:59: NOT implemented)

This workflow creates various summary files, plots and an HTML report for a sample using the output of the main workflow.

Note: How the metaP peptide/protein reports were generated from raw metaP data is described in notes/gdb_metap.md.

config:
- sample configs used for the main workflow
workflow: workflow_report/

To execute this workflow for all samples:

./config/reports.sh "YourEnvName" "WhereToCreateCondEnvs"

Figures workflow (2021-05-26 15:54:52: NOT implemented)

Re-create figures (and tables) used in the manuscript. This workflow should be only run after running the main workflow and report workflow for all samples.

config: config/fig.yaml
- change work_dir
- change paths for all samples in samples
workflow: workflow_figures/

conda activate "YourEnvName"
snakemake -s workflow_figures/Snakefile --cores 1 --configfile config/fig.yaml --use-conda --conda-prefix "WhereToCreateCondEnvs" -rpn # dry-run

Notes

Notes for manual/additional analyses done using the generated data.

susheelbhanu / nomis_pipeline

nomis_pipeline

About

Setup

Conda

Dependencies

How to run

Configs

Workflows

STEPS

Launching `IMP3`

imp workflow

Main workflow

Report workflow (2021-05-26 15:54:59: NOT implemented)

Figures workflow (2021-05-26 15:54:52: NOT implemented)

Notes

About

Languages

nomis_pipeline

About

Setup

Conda

Dependencies

How to run

Configs

Workflows

STEPS

Launching IMP3

imp workflow

Main workflow

Report workflow (2021-05-26 15:54:59: NOT implemented)

Figures workflow (2021-05-26 15:54:52: NOT implemented)

Notes

About

Languages

Launching `IMP3`