nityendra21 / setu

SARS-CoV-2 Genome Assembler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Maintenance Maintainer made-with-bash MIT license Safe

Setu

Setu: A pipeline for robust assembly of the SARS-CoV-2 genome.

Setu (sanskrit सेतु) means bridge. It bridges all the reads to genome.

______ _____/  |_ __ __ 
 /  ___// __ \   __\  |  \
 \___ \\  ___/|  | |  |  /
/____  >\___  >__| |____/ v0.1
     \/     \/ bridging the SARS-CoV-2 genome 

Getting Started

Installation

Clone the repository using git:

git clone https://github.com/jnarayan81/setu.git

Dependencies

Required dependencies can be installed in a separate conda environment named setu through:

cd setu
conda create -f env_setu.yml

Setu requires the following dependencies to be installed:

  • Python 3.7
  • Trimmomatic
  • BWA-MEM
  • Samtools
  • Bedtools
  • Spades
  • Ragout
  • QUAST
  • R >=3.6.0
    • Reshape package

Alternatively, all dependencies can be installed through Conda.

Usage

Setu supports only paired-end Illumina reads at the moment, work on long-reads, command is as follows:

Paired-end reads:

./setu.sh -k yes -m pe -t 1 -r paired_1.fastq,paired_2.fastq -f on -o OutputDirectory

Please note that there's no space after the comma when specifying reads using the -r flag.

Assembly of long-reads and hybrid-reads is currently ongoing and will be updated.

Testing

You can test your installation by running:

./test_run.sh

Setu: A Pipeline for the robust Assembling of the SARS-CoV-2 Genome

Setu: A pipeline for robust assembly of the SARS-CoV-2 genome. It has three mode of genome assembly: 1. Paired-End 2. Hybrid 3. Long reads.

The promise of setu:

  • Implement recent NGS techniques to achieve reliable genome assembly.
  • Maintain flexibility in reads type selection.
  • Build on standard Conda and Python packages.

In a nutshell, this pipeline is intended to use all types of NGS reads to generate a genome of high quality.

Consult Jitendra Narayan at jnarayan81@gmail.com or info@bioinformaticsonline.com for any support.

Introduction

The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a new Betacoronavirus strain that infects humans. This disease is the cause of the ongoing Coronavirus Disease (CoViD2019) pandemic. Because of the rapid innovation and decreasing prices of high throughput sequencing technologies, the virus has been sequenced internationally in a large number of people who have been infected. While next-generation sequencing (NGS) technology provides a reliable method of identifying potential infections in clinical specimens, simple and user-friendly bioinformatics workflows are necessary to acquire a complete viral genome sequence with the greatest accuracy. We have developed a thorough workflow for evaluating and decoding SARS-CoV 2 sequencing data using open source technologies. It entails complete sequence elimination of host- or bacteria-related NGS reads prior to de novo assembly, resulting in the quick and accurate assembly of viral genome metagenomic sequences.

Flowchart

Illustrating procedures for assembly of the SARS-CoV-2 genome.

Setu flowchart

News

June 1, 2021: Release v0.1, see release notes here

Blogs and Publications

Citation

If you use setu in your research, please cite us as follows:

Jitendra Narayan¹*†, Nityendra Shukla¹†, Suyash Agarwal², Neha Srivastava³, Prekshi Garg³, Prachi Srivastava³* Setu: A Pipeline for the robust Assembling of the SARS-CoV-2 Genome https://github.com/jnarayan81/setu, 2020. Version 0.1.

*Corresponding authors
†Contributed equally

BibTex:

@misc{setu,
  author={Jitendra Narayan¹*†, Nityendra Shukla¹†, Suyash Agarwal², Neha Srivastava³, Prekshi Garg³, Prachi Srivastava³*}
  title={{Setu}: {A Pipeline for the robust Assembling of the SARS-CoV-2 Genome}},
  howpublished={https://github.com/jnarayan81/setu},
  note={Version 0.1},
  year={2021}
}

Contributing and Feedback

This project welcomes contributions and suggestions.

For more information contact jnarayan81@gmail.com or (mailto:info@bioinformaticsonline.com) with any additional questions or comments.

About

SARS-CoV-2 Genome Assembler


Languages

Language:Shell 53.2%Language:Python 24.2%Language:Perl 20.3%Language:R 2.2%