shishenyxx / Sperm_transmission_mosaicism

This repository collects pipelines, codes, and some intermediate results for the study of mosaic SNV/Indels for sperm, blood, saliva samples of a transmission genomic study.

Home Page:https://elifesciences.org/articles/78459

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sperm transmission mosaicism

This repository collects pipeline, code, and some intermediate results for the study of mosaic SNV/Indels for sperm, blood, and saliva samples of a small cohort. Raw WGS data of this study are available here. Genotyping results of each sample is provided here. Re-analysis of the transmission of paternal mosaic variants from the ASD trios is provided here. 300x WGS panel of normal are available here.

The pipelines and analysis are derived from our recent large-scale sperm study, of which the code could be found here.


1. Pipelines for the process of whole-genome sequencing data

1.1 Pipelines for WGS data process and quality control

Pipelines for pre-processing of the bams.

Code for depth of coverage and insertsize distribution.

1.2 Code for the population origin analysis

Pipeline for population analysis, and code for plot.

1.3 Pipelines for mosaic SNV/indel calling and variant annotations

Pipelines for MuTect2 (paired mode) and Strelka2 (somatic mode) variant calling from WGS data

Pipelines for MuTect2 (single mode) with the "Full Panel of Normal" version is used. The MuTect2 (single mode) result is followed by MosaicForecast, and the variant annotation pipeline.


2. Pipelines for the process of Massive Parallel Amplicon Sequencing (MPAS)

2.1 Pipelines for MPAS data alignment and processing

Pipelines for alignment, processing, and germline variant calling of MPAS reads.

2.2 Pipelines for AF quantification and variant annotations

Pipelines for AF quantification and variant anntations.

Code to filter and annotate on MPAS data.


3. Pipelines for the data analysis, variant filtering, comprehensive annotations, and statistical analysis

3.1 Pipelines for mosaic variant determination, annotations, and plotting

After variant calling from different strategies, variants were annotated and filtered by a python script and positive mosaic variants as well as the corresponding transmission to multiple samples and additional information were annotated.

3.2 Pipelines for statistically analysis, and the related plotting

Code for the estimation of expected transmissions assuming independent transmission via a dynamaic programming algorithm.

Code and example data for the permutation analysis to estimate the indepence of transmission in each family.

Code and data for the re-analysis of transmission in 8 ASD families previously analyzed in the first and second study.


4. Contact:

📧 Martin Breuss: martin.breuss@cuanschutz.edu

📧 Xiaoxu Yang: u6055394@utah.edu, xiaoxuyanglab@gmail.com, xiy010@health.ucsd.edu

📧 Joseph Gleeson: jogleeson@health.ucsd.edu


5. Cite the code

Breuss MW, Yang X, et al., Gleeson JG. Unbiased mosaic variant assessment in sperm: a cohort study to test predictability of transmission. 2022. (eLife, DOI:10.7554/eLife.78459, PMID:35787314)

Sperm_Mosaic_Cover

About

This repository collects pipelines, codes, and some intermediate results for the study of mosaic SNV/Indels for sperm, blood, saliva samples of a transmission genomic study.

https://elifesciences.org/articles/78459

License:GNU General Public License v3.0


Languages

Language:Python 97.4%Language:R 2.6%