qazwsx1995 / metaGEM

A Snakemake pipeline for the generation of MAGs, reconstruction of GEMs, and simulation of cross-feeding interactions within microbial communities from lab cultures, human gut, ocean, plant-associated, and bulk soil microbiomes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

metaGEM

A Snakemake-based workflow to generate high quality metagenome assembled genomes from short read paired-end data, reconstruct genome scale metabolic models, and perform community metabolic interaction simulations on high performance computing clusters.

metawrapfigs_v2 002

metaGEM integrates an array of existing bioinformatics and metabolic modeling tools using Snakemake, for the purpose of interrogating social interactions in bacterial communities of the human gut microbiome. From WMGS datasets, metagenome assembled genomes (MAGs) are reconstructed, which are then converted into genome-scale metabolic models (GEMs) for in silico simulations of cross feeding interactions within sample based communities. Additional outputs include abundance estimates, taxonomic assignment, growth rate estimation, pangenome analysis, and eukaryotic MAG identification.

Workflow

Core:

  1. metaGEM setup
  2. Quality filter reads with fastp
  3. Assembly with megahit
  4. Draft bin sets with CONCOCT,MaxBin2, and MetaBAT2
  5. Refine & reassemble bins with metaWRAP
  6. Taxonomic assignment with GTDB-tk
  7. Relative abundances with bwa and samtools
  8. Reconstruct & evaluate genome-scale metabolic models with CarveMe and memote
  9. Species metabolic coupling analysis with SMETANA

Bonus:

  1. Growth rate estimation with GRiD, SMEG or CoPTR
  2. Pangenome analysis with roary
  3. Eukaryotic draft bins with EukRep and EukCC

Usage

_________________________________________________________________________/\\\\\\\\\\\\___/\\\\\\\\\\\\\\\___/\\\\____________/\\\\_        
 _______________________________________________________________________/\\\//////////___\/\\\///////////___\/\\\\\\________/\\\\\\_       
  __________________________________________/\\\________________________/\\\______________\/\\\______________\/\\\//\\\____/\\\//\\\_      
   ____/\\\\\__/\\\\\________/\\\\\\\\____/\\\\\\\\\\\___/\\\\\\\\\_____\/\\\____/\\\\\\\__\/\\\\\\\\\\\______\/\\\\///\\\/\\\/_\/\\\_     
    __/\\\///\\\\\///\\\____/\\\/////\\\__\////\\\////___\////////\\\____\/\\\___\/////\\\__\/\\\///////_______\/\\\__\///\\\/___\/\\\_    
     _\/\\\_\//\\\__\/\\\___/\\\\\\\\\\\______\/\\\_________/\\\\\\\\\\___\/\\\_______\/\\\__\/\\\______________\/\\\____\///_____\/\\\_   
      _\/\\\__\/\\\__\/\\\__\//\\///////_______\/\\\_/\\____/\\\/////\\\___\/\\\_______\/\\\__\/\\\______________\/\\\_____________\/\\\_  
       _\/\\\__\/\\\__\/\\\___\//\\\\\\\\\\_____\//\\\\\____\//\\\\\\\\/\\__\//\\\\\\\\\\\\/___\/\\\\\\\\\\\\\\\__\/\\\_____________\/\\\_ 
        _\///___\///___\///_____\//////////_______\/////______\////////\//____\////////////_____\///////////////___\///______________\///__
        
        
Usage: bash metaGEM.sh [-t|--task TASK] 
                       [-j|--nJobs NUMBER OF JOBS] 
                       [-c|--cores NUMBER OF CORES] 
                       [-m|--mem GB RAM] 
                       [-h|--hours MAX RUNTIME]

Snakefile wrapper/parser for metaGEM. 

Options:
  -t, --task        Specify task to complete:

                        SETUP
                            createFolders
                            downloadToy
                            organizeData

                        WORKFLOW
                            fastp 
                            megahit 
                            crossMap 
                            concoct 
                            metabat
                            maxbin 
                            binRefine 
                            binReassemble 
                            extractProteinBins
                            carveme
                            memote
                            organizeGEMs
                            smetana
                            extractDnaBins
                            gtdbtk
                            abundance 
                            grid
                            prokka
                            roary

                        VISUALIZATION (in development)
                            qfilterVis
                            assemblyVis
                            binningVis
                            taxonomyVis
                            modelVis
                            interactionVis
                            growthVis

  -j, --nJobs       Specify number of jobs to run in parallel
  -c, --nCores      Specify number of cores per job
  -m, --mem         Specify memory in GB required for job
  -h, --hours       Specify number of hours to allocated to job runtime

Automated installation

Clone this repository to your HPC or local computer and run the env_setup.sh script:

git clone https://github.com/franciscozorrilla/metaGEM.git
cd metaGEM
bash env_setup.sh

This script will set up 3 conda environments, metagem, metawrap, and prokkaroary, which will be activated as required by Snakemake jobs.

CheckM

CheckM is used extensively to evaluate the output of various intermediate steps. Although the CheckM package is installed in the metawrap environment, the user is required to download the CheckM database and run checkm data setRoot <db_dir> as outlined in the CheckM installation guide.

CPLEX

Unfortunately CPLEX cannot be automatically installed in the env_setup.sh script, you must install this dependency manually within the metagem conda environment. GEM reconstruction and GEM community simulations require the IBM CPLEX solver, which is free to download with an academic license. Refer to the CarveMe and SMETANA installation instructions for further information or troubleshooting. Note: CPLEX v.12.8 is recommended.

Manual installation

You can manually set up the environments with the following chunks of code.

metaGEM

conda create -n metagem mamba
source activate metagem
mamba install python snakemake fastp megahit bwa samtools=1.9 kallisto concoct=1.1 metabat2 maxbin2 gtdbtk eukrep eukcc smeg motus
pip install --user memote carveme smetana

metaWRAP

conda create -n metawrap
source activate metawrap
conda install -c ursky metawrap-mg=1.3.2

prokka-roary

conda create -n prokkaroary
source activate prokkaroary
conda install prokka roary

Tutorial

metaGEM can even be used to explore your own gut microbiome sequencing data from at-home-test-kit services such as unseen bio. The following demo showcases the metaGEM workflow on two unseenbio samples.

Active Development

Are you sad that your favorite binner didn't make it into the metaGEM workflow? Have you developed a new bioinformatics tool that you would like to see incorporated into metaGEM? Want alternative tools for certain tasks or even new additional features? We want to hear from you!

If you want to see any new additional or alternative tools incorporated into the metaGEM workflow please do not hesitate to raise an issue or create a pull request. Snakemake allows workflows to be very flexible, and adding new rules is as easy as filling out the following template and adding it to the Snakefile:

rule package-name:
    input:
        rules.rulename.output
    output:
        f'{config["path"]["root"]}/{config["folder"]["X"]}/{{IDs}}/output.file'
    message:
        """
        Helpful and descriptive message detailing goal of this rule/package.
        """
    shell:
        """
        # Well documented command line instructions go here
        
        # Load conda environment 
        set +u;source activate {config[envs][package]};set -u;

        # Run tool
        package-name -i {input} -o {output}
        """

Publications

The metaGEM workflow has been used in some capacity in the following publications:

Plastic-degrading potential across the global microbiome correlates with recent pollution trends
Jan Zrimec, Mariia Kokina, Sara Jonasson, Francisco Zorrilla, Aleksej Zelezniak
bioRxiv 2020.12.13.422558; doi: https://doi.org/10.1101/2020.12.13.422558 

Please cite

metaGEM: reconstruction of genome scale metabolic models directly from metagenomes
Francisco Zorrilla, Kiran R. Patil, Aleksej Zelezniak
bioRxiv 2020.12.31.424982; doi: https://doi.org/10.1101/2020.12.31.424982 

About

A Snakemake pipeline for the generation of MAGs, reconstruction of GEMs, and simulation of cross-feeding interactions within microbial communities from lab cultures, human gut, ocean, plant-associated, and bulk soil microbiomes

License:MIT License


Languages

Language:Python 74.5%Language:R 15.2%Language:Shell 10.3%