bsipos / paper-ng-sam

Pipeline for simulating NG-SAM experiments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pipeline for simulating NG-SAM experiments

This repository contains the simulation pipeline used in the paper:

Botond Sipos, Tim Massingham, Adrian M. Stütz, Nick Goldman (2012) An improved protocol for sequencing of repetitive genomic regions and structural variations using mutagenesis and Next Generation Sequencing. PLoS ONE 7(8):e43359 doi:10.1371/journal.pone.0043359.

Files

  • bin - scripts:
    • calibrate_mut - script calculating the branch length scaling factor
    • pcr_coal.R - R script simulating PCR amplifications using pcrcoal and dilutions by sampling from Poisson distributions
    • sim_exp - simulate a single NG-SAM experiment with the specified target sequence and parameters
    • run_seq_sim - simulate NG-SAM experiments on different target sequences
    • run_dil_sim - simulate NG-SAM experiments with a range of dilution factors
    • plot_dil_res - plot the results of seq_sim
    • plot_seq_res - plot the results of dil_sim
  • dat - data files:
    • bl_scaler.txt - the branch length scaling factor and the corresponding Hamming distance as calculated by the "calibrate_mut" script
    • dmel_eater.fas - the coding sequence of the Drosophila melanogaster eater gene
    • eater_root.fas - the target sequence used in the second simulation setup (run_dil_sim)
    • MH22.fas - sequence used to calibrate the branch length scaling factor from Zaccolo et al.
    • mutation_model.tab - the mutation spectrum observed by Zaccolo et al., used to build the mutation model
    • s_8_4x.runfile - the simNGS runfile used in the simulations
  • lib/*.py - python classes used by the scripts under bin/
  • seq_sim - output directory for the first simulation setup
  • dil_sim - output directory for the second simulation setup
  • Makefile - makefile containing utility targets
  • simulations.mk - makefile containing simulation targets and parameters
  • reports - plots:
    • calibration_report.pdf - diagnostic plots from the "calibrate_mut" script
    • seq_sim.pdf - the results of the first simulation setup
    • dil_sim.pdf - the results of the second simulation setup

Requirements

The simulation pipeline runs in a standard UNIX environment and uses the Platform LSF workload manager to distribute simulations between multiple compute nodes. It also requires the following software to be installed:

Running simulations

The simulation parameters of interest are stored in the makefile simulations.mk. The simulations and the plotting scripts are launched by calling the following make targets:

  • seq_sim - Submit the jobs for the first simulation setup. The results and random target sequences are saved under "seq_sim".
  • plot_seq_res - process the output of seq_sim
  • dil_sim - Submit the jobs for the first simulation setup. The results are saved under "dil_sim".
  • plot_seq_res - process the output of dil_sim

Other useful make targets:

  • t - test the simulation framework
  • calibration - recalculate the branch length scaling factor and save it in dat/bl_scaler.txt

About

Pipeline for simulating NG-SAM experiments

License:Other


Languages

Language:Python 88.7%Language:Makefile 6.1%Language:R 5.2%