phue/pangenome

Introduction

Warning: This pipeline is currently UNDER CONSTRUCTION. Some features may not work or not work as intended!

nf-core/pangenome is a bioinformatics best-practise analysis pipeline for the rendering of a collection of sequences into a pangenome graph. Its goal is to build a graph that is locally directed and acyclic while preserving large-scale variation. Maintaining local linearity is important for interpretation, visualization, mapping, comparative genomics, and reuse of pangenome graphs.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Quick Start

Warning: The Dockerfile Github Action is not running, yet. Therefore, make sure you always have the latest image. Another caveat is that you need to clone the repository before you can execute the pipeline. Once we have an automated docker image build on nf-core, these inconveniences will be gone.

Install nextflow
Install any of Docker, Singularity, Podman, Shifter or Charliecloud for full pipeline reproducibility (please only use Conda as a last resort; see docs)

Build the current docker image if necessary

docker build --no-cache . -t nfcore/pangenome:dev

Test the workflow on a minimal dataset
```
nextflow run nf-core/pangenome -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute> --n_mappings 11
```
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute> in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.

Start running your own analysis!

nextflow run nf-core/pangenome -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input "input.fa.gz" --n_mappings 11

Be careful, the input FASTA must have been compressed with bgzip. See usage docs for all of the available options when running the pipeline.

Pipeline Summary

Documentation

The nf-core/pangenome pipeline comes with documentation about the pipeline: usage and output.

Credits

nf-core/pangenome was originally adapted from the pangenome graph builder pggb pipeline by Simon Heumos, Michael Heuer.

Many thanks to all who have helped out and contributed along the way, including (but not limited to)*:

Name	Affiliation
Philipp Ehmele	Institute of Computational Biology, Helmholtz Zentrum München, Munich, Germany
Erik Garrison	The University of Tennessee Health Science Center, Memphis, Tennessee, TN, USA
Andrea Guarracino	Genomics Research Centre, Human Technopole, Milan, Italy
Michael Heuer	UC Berkeley, USA
Lukas Heumos	Institute of Computational Biology, Helmholtz Zentrum München, Munich, Germany Institute of Lung Biology and Disease and Comprehensive Pneumology Center, Helmholtz Zentrum München, Munich, Germany
Simon Heumos	Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Germany Biomedical Data Science, Department of Computer Science, University of Tübingen, Germany

* Listed in alphabetical order

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #pangenome channel (you can join with this invite).

Citations

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

In addition, references of tools and data used in this pipeline are as follows:

ODGI: understanding pangenome graphs.

Andrea Guarracino*, Simon Heumos*, Sven Nahnsen, Pjotr Prins & Erik Garrison.

Bioinformatics 2022 Jul 01 doi: 10.1093/bioinformatics/btac308.

*contributed equally

Unbiased pangenome graphs

Erik Garrison, Andrea Guarracino.

bioRxiv 2022 Feb 02 doi: 10.1101/2022.02.14.480413.

Attention

MultiQC Report

In the resulting MultiQC report, in the Detailed ODGI stats table, it says smoothxg. To be clear, these are the stats of the graph after polishing with gfaffix! Some tools were hardcoded in the ODGI MultiQC module, but hopefully this will be fixed in the future.

phue / pangenome