PriyaLakr / NGS_DataAnalysis

Some scripts to make your bioinformatics analyses reproducible and a bit easy πŸ€“

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DataAnalysis

Author: Priya Lakra

Scripts to make your bioinformatics analyses a bit easy πŸ€“ and reproducible πŸ€“

Best practices: 1. Keep your input files and scripts for one analysis in one folder 2. Always modify your $PATH environment variable to include all the dependencies required for the analysis

Dependencies

Software version tested
mafft v7.475
bowtie2 v2.3.5.1
samtools v1.9
python 3.7.6

bulk_mafft.sh

Script to perform bulk multiple sequence analysis using mafft by using this one simple command. Dependency: mafft

bash bulk_mafft.sh [experiment_name] [inpath] [outpath] [threads]

pl_count_length.py

Script for counting the total length of sequences in a FASTA file. This script will read all FASTA files in a directory specified by the user, and print the total sequence length of each FASTA file. The user needs to provide the path of the directory where all FASTA files are located. This input path is specified with --path.

python pl_count_length.py --path

pl_NGS_process.sh

This is a great script to perform multiple steps during NGS analysis with single command line script!

It can 1) extract unaligned reads from bam files 2) align reads using bowtie2 3) process post-alignment sam files and produce idx stats

bash pl_NGS_process.sh [options]

# Usage: -i input_dir  -f filetype  -d out_dir_path  -l indexlocation -t number_of_threads  -x index_file_name  -r bowtie2_run_mode. -e extractreads -a alignreads -p process_reads

About

Some scripts to make your bioinformatics analyses reproducible and a bit easy πŸ€“


Languages

Language:Shell 86.1%Language:Python 13.9%