mandyzhang6/pangenome

Introduction

nf-core/pangenome is a bioinformatics best-practise analysis pipeline for the rendering of a collection of sequences into a pangenome graph. Its goal is to build a graph that is locally directed and acyclic while preserving large-scale variation. Maintaining local linearity is important for interpretation, visualization, mapping, comparative genomics, and reuse of pangenome graphs**.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Quick Start

Warning: The Dockerfile Github Action is not running, yet. Therefore, make sure you always have the latest image. Another caveat is that you need to clone the repository before you can execute the pipeline. Once we have an automated docker image build on nf-core, these inconveniences will be gone.

Install nextflow
Install any of Docker, Singularity, Podman, Shifter or Charliecloud for full pipeline reproducibility (please only use Conda as a last resort; see docs)

Build the current docker image if necessary

docker build --no-cache . -t nfcore/pangenome:dev

Test the workflow on a minimal dataset
```
nextflow run nf-core/pangenome -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>
```
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute> in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.

Start running your own analysis!

nextflow run nf-core/pangenome -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input "input.fa.gz"

See usage docs for all of the available options when running the pipeline.

Pipeline Summary

Documentation

The nf-core/pangenome pipeline comes with documentation about the pipeline: usage and output.

Credits

nf-core/pangenome was originally adapted from the pangenome graph builder pggb pipeline by Simon Heumos, Michael Heuer.

Many thanks to all who have helped out and contributed along the way, including (but not limited to)*:

Name	Affiliation
Philipp Ehmele	University of Hamburg, Hamburg, Germany
Erik Garrison	The University of Tennessee Health Science Center, Memphis, Tennessee, TN, USA
Andrea Guarracino	University of Rome Tor Vergata, Rome, Italy
Michael Heuer	UC Berkeley, USA
Lukas Heumos	Institute of Computational Biology, Helmholtz Zentrum München, Munich, Germany \ Institute of Lung Biology and Disease and Comprehensive Pneumology Center, Helmholtz Zentrum München, Munich, Germany
Simon Heumos	Quantitative Biology Center (QBiC) Tübingen, University of Tübingen, Germany

* Listed in alphabetical order

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #pangenome channel (you can join with this invite).

Citations

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

In addition, references of tools and data used in this pipeline are as follows:

mandyzhang6 / pangenome