tanaes / SmkHumann3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Quinn_metagenomics_pipeline

Stub workflow for a Snakemake metagenomics pipeline

We will be using this repository to develop a basic pipeline for metagenomic analysis using Snakemake.

The pipeline is meant to automate the processing and analysis of metagenomic data using HUMAnN3.

The workflow will execute the following steps:

  • preliminary QC of reads using FastQC
  • read trimming using Cutadapt
  • human read removal using Bowtie2
  • trimmed read QC using FastQC
  • QC summary using MultiQC
  • Taxonomic profiling using MetaPhlAn3
  • Functional profiling using HUMAnN3

Rule Graph

Snakemake Installation

We will be using Snakemake to run our workflow for us. Snakemake is an absolutely amazing workflow language designed for scientific computing. It's especially suited to handling the kinds of data we generate with DNA sequencing -- many independent files, all of which need to undergo the same processing steps.

We'll follow the steps for installing Snakemake recommended on the website. The only difference is that we'll be installing Snakemake into a new environment dedicated for this metagenomic pipeline, so maybe we'll give it its own name. Here's what I did:

mamba create -c conda-forge -c bioconda -n quinn_mg snakemake

Now, we can activate this new environment and start programming!

(base) macbook:pkgs jgs286$ conda activate quinn_mg
(quinn_mg) macbook:pkgs jgs286$ snakemake --version
5.32.0

Samples

We will be using a tabular text file to tell Snakemake what are samples are, and any metadata associated with them about which we want it to know. In this case, the main thing is where the reads associated with the sample can be found!

Our samples.txt file will look something like this:

sample	fq1	fq2
A		test_data/S22205_S104_L001_R1_001.fastq.gz	test_data/S22205_S104_L001_R2_001.fastq.gz
B	test_data/S22207_S103_L001_R1_001.fastq.gz	test_data/S22207_S103_L001_R2_001.fastq.gz

About

License:MIT License


Languages

Language:HTML 99.7%Language:Python 0.3%