HuaZou / bioinformatics_pipeline

Bioinformatics Pipeline

Home Page:https://zouhua.top/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bioinformatics pipeline

Introduction

An effective, reproducible and reliable data analysis workflow is based on the state-of-the-art pipeline, using the most up-to-date methods. In order to facilitate my future work, building the bioinformatics pipeline is necessary. However, how to construct the workflow is still hard for me. In my view, I wanna use the nextflow or snakemake program language to do it. Before doing it, what I need is to describe the workflows in the mindmap which could make me be clear.

Interests

  • amplicon sequencing analysis
  • metagenomics sequencing analysis
  • bulk-RNA sequencing analysis
  • DNA methylation by Illumina Array

Workflow

The workflow comprises of two parts, one is from raw data to profile, and the other is data analysis (statistical analysis)

  • the first parts
  • demultiplex sequences;
  • scan the quality of reads;
  • filter the low quality reads and remove host DNA sequence
  • align the high quality reads into the reference database
  • obtain the profile whose structure is $M x N$ matrix (M: features' name; N: sampleid)
  • the second part
    • statistical analysis such as wilcoxon rank sum test, LDA, PCoA, linear regression analysis and multivariables association analysis
    • machine learning

Notice

First and foremost, I utilize the perl program language to do some preliminary work and finally convert all the workflows into snakemake or nextflow.

About

Bioinformatics Pipeline

https://zouhua.top/

License:MIT License


Languages

Language:R 31.5%Language:Python 28.6%Language:Perl 24.3%Language:CSS 13.5%Language:Shell 2.2%