HuaZou/bioinformatics_pipeline

snakemake data-analysis pipeline metagenomics amplicon rna-seq mirna-seq methylation tcga-data geo-data

Bioinformatics pipeline

Introduction

An effective, reproducible and reliable data analysis workflow is based on the state-of-the-art pipeline, using the most up-to-date methods. In order to facilitate my future work, building the bioinformatics pipeline is necessary. However, how to construct the workflow is still hard for me. In my view, I wanna use the nextflow or snakemake program language to do it. Before doing it, what I need is to describe the workflows in the mindmap which could make me be clear.

Interests

amplicon sequencing analysis
metagenomics sequencing analysis
bulk-RNA sequencing analysis
DNA methylation by Illumina Array

Workflow

The workflow comprises of two parts, one is from raw data to profile, and the other is data analysis (statistical analysis)

the first parts
demultiplex sequences;
scan the quality of reads;
filter the low quality reads and remove host DNA sequence
align the high quality reads into the reference database
obtain the profile whose structure is $M x N$ matrix (M: features' name; N: sampleid)
the second part
- statistical analysis such as wilcoxon rank sum test, LDA, PCoA, linear regression analysis and multivariables association analysis
- machine learning

Notice

First and foremost, I utilize the perl program language to do some preliminary work and finally convert all the workflows into snakemake or nextflow.

About

Bioinformatics Pipeline

https://zouhua.top/

snakemake data-analysis pipeline metagenomics amplicon rna-seq mirna-seq methylation tcga-data geo-data

MIT License

Languages

Language:R 31.5%Language:Python 28.6%Language:Perl 24.3%Language:CSS 13.5%Language:Shell 2.2%