reachsagaya / sv-genotyping-paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Genotyping structural variants in pangenome graphs using the vg toolkit

This repository contains the commands and scripts used for Genotyping structural variants in pangenome graphs using the vg toolkit, 2019, in press. They are primarily dependent on toil-vg, which can run most other dependencies via Docker. There is a WIKI here for genotyping SVs with toil-vg. Github issues is the best place to raise questions or concerns.

Links to necessary data are also listed for each analysis.

Of note the code to reproduce the figures and tables in the manuscript are available in the manuscript's repo.

This repository is distributed under the MIT license terms.

Simulation experiment

The different methods were compared on simulated sequence and SVs. Different depth were tested. We also tested the effect of errors in the breakpoint location of the SVs. Scripts are available and described in the simulation folder.

Whole genome experiments in human

These were run on AWS via Toil. In theory, they could use any other framework that Toil supports, though the scripts will have to be modified accordingly.

In the human directory, there is one folder for each dataset with the commands to download/prepare the data and genotype SV with vg and the other methods.

There is also a toil-scripts folder with helper scripts that were used to run the analysis on AWS.

The commands for the evaluation, using Snakemake, are available in the sveval folder.

The VCFs produced produced by vg and the other methods across these datasets are available at https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=vgsv2019/vcfs/.

De-novo assembly graph experiments in yeast

The yeast experiments are written as snakemake pipelines. Each pipeline consists of a set of rules that process a set of input files into a set of output files.

In the yeast directory, there are several folders for the different phases of the experiment as well as detailed descriptions on how to re-run it.

About

License:MIT License


Languages

Language:Python 58.1%Language:Shell 27.7%Language:R 13.6%Language:Dockerfile 0.6%