joshuashen / ExomePipe

WGS and Exome analysis pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Computation pipeline for analyzing WGS and Exome data.

*Major steps* 
1. Mapping reads onto reference using BWA 
2. Calling variants using GATK
3. Quality assessment and functional annotation of variants 
4. Association test (including single variants and gene-based) 

*Details*
1. Mapping


2. Genotype calling.
(1) local realignment: optimize alignment around short indels. 
(2) base quality recalibration:
(3) assess FDR of novel variants based on transition / transverion 

3. Association:
(1) Single variants: Fisher's exact test
(2) gene-based grouping: 
  - simple collapse of rare and uncommon variants (<5%). Count carriers or variants. then compare cases and controls
  - stratified collapse: rare (<1%), uncommon (>1% and <5%)
  - weighted by frequency. (Madsen and Browning)
  - Price et al, flexible threshold for collapse, Bayesian incorperation of PolyPhen-2 scores (only applicable to missense)
  - double hits: genes with two or more non-synonymous novel variants.
  - c-alpha: could be useful if there are many neutral non-synonymous novel variants.

 
 

About

WGS and Exome analysis pipeline


Languages

Language:Shell 42.2%Language:Ruby 37.8%Language:R 19.4%Language:Perl 0.6%