XinsongDu / UFRC-YAMP

YAMP: Yet Another Metagenomic Pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UFRC-YAMP

Introduction

  • UFRC-YAMP is an extension of YAMP for UFRC users.
  • YAMP is a tool for sequence data processing. For basic usage of YAMP, please check its wiki
  • Please cite YAMP when you use it for publication purposes:

Visconti A,. Martin T.C., and Falchi M., "YAMP: a containerised workflow enabling reproducibility in metagenomics research", GigaScience (2018), https://doi.org/10.1093/gigascience/giy072

Usage

  1. Make necessary folders:
mkdir -p data
mkdir -p results
mkdir -p logs
  1. Prepare resources before running YAMP:
sbatch run-01_getResources.sh
  1. Run demo with example data:
bash run-02_getDemoData.sh && sbatch run-03_runYAMPdemo.sh
  1. Run your data in parallel:
bash run.sh
  • Please modify slurm configuration in hpc_submit.sh and run-03_runYAMPdemo.sh before running.
  • Data should be stored in data folder. Data format can be .tar.gz, .tar.bz2 (The code will decompress them automatically), or paired of .fastq* files that are under the same folder. Results will be stored in result folder. Now the code can only support to run for paired files. Please name all your paired files in the form of A-R1-B and A-R2-B, in which A and B stand for two strings in the file names. The output directory will be named as A-.
  • Processing each pair of files need 4 CPUs and 40 GB Memory (Please modify nextflow.config file if you want to use a different one). If you have N CPUs and M Giga Bytes Memory size, you will be able to run min(N/4, M/40) in parallel.
  1. Get statistics/completeness from results after getting all needed results:
ml python3 && python3 get_stats.py
  1. Get MultiQC report:
ml gcc/5.2.0 && ml multiqc/1.5 && multiqc results/

Changelog

UFRC-YAMP / 2019-02-26

Enhancements:

  • Change file locations in run-01_getResources.sh from absolute to relative, which is more flexible (i.e. if you copy UFRC-YAMP from A/UFRC-YAMP to B/UFRC-YAMP, relagive location will not generate error).
  • Add hpc_submit.sh, parallel.py and run.sh to enable UCRC-YAMP to process multiple pairs of files in parallel.

UFRC-YAMP / 2019-02-27

Enhancements:

  • Add get_stats.py to get statistics from results, adjusted run-03_runYAMPdemo.sh and run.sh. A sample stats file is under results folder.
  • Update README for UFRC-YAMP.

UFRC-YAMP / 2019-05-11

Enhancements:

  • Add code to get_stats.py to get completeness of each data (i.e. the presence of "STEP 3 (Community Characterisation) terminated" in the log file under a certain folder indicates the completeness of the processing)
  • Enable multiQC to report the complete status of the datasets.
  • Update README for UFRC-YAMP.

Other notes

  • In run.sh, ml python3 has to be run after pulling singularity, otherwise there would be an python error No module named os

License

YAMP is licensed under GNU GPL v3.

About

YAMP: Yet Another Metagenomic Pipeline

License:GNU General Public License v3.0


Languages

Language:HTML 94.8%Language:Nextflow 4.2%Language:Shell 0.6%Language:Python 0.4%Language:Dockerfile 0.1%