shajoezhu / LDhot

Detect recombination hotspots using population genetic data.

Home Page:http://arxiv.org/abs/1403.4264

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LDhot

A program to detect recombination hotspots using population genetic data.

##Installation

After downloading, switch to the download folder and type:

make

If you're lucky, this will compile without errors. However, note you may need a compiler that supports the C++11 standard. If you see lots of errors, you may need to upgrade your compiler.

On some systems, you can compile with multi-threading turned on, which can result in a signficant reduction in runtime. To do this, type:

make MULTI=1

##Basic Usage

Two programs are provided. The main program is called as follows.

./ldhot --seq <seq_file> --loc <loc_file> --lk <lk_file> --res <res_file> --nsim 1000 --out <out_prefix>

The seq_file, loc_file, lk_file, and res_file are all derived from LDhat, although the seq_file is required to be phased and encoded using just zeros and ones. The --nsim parameter controls the number of simulations used within the method, with at least 1000 simulations being recommended. A complete option list is given below.

The ldhot program produces an output file of the form <output_prefix>.hotspots.txt, which contains the details of the windows tested for the presence of a hotspot. This file can be treated as the final output, or further summarized using the ldhot_summary program. This is a simple program which combines windows called as significant by the main ldhot program. It is called as follows.

./ldhot_summary --res <res_file> --hot <hotspot_file> --out <out_prefix>

The output of this program can be found in <output_prefix>.hot_summary.txt.

A more complete example of the usage of LDhat and LDhot, with both input and output files, can be found in the example folder.

##Option List

###ldhot

The ldhot program takes the following parameters.

####Required Parameters:

  • --seq : Input LDhat-format sequence file. Required to be phased and encoded using zeros and ones only.
  • --loc : Input LDhat-format positions file.
  • --lk : Input LDhat-format likelihood lookup file.
  • --res : Input recombination rate estimates in same format as LDhat 'stat' output.

####Important Parameters:

  • --out : Prefix for output files (default: out).
  • --nsim : Maximum number of simulations to use (default: 100 but at least 1000 recommended).

####Other Parameters:

  • --startpos : Start position in kb.
  • --endpos : End position in kb.
  • --step : Step size (in kb) between tested windows (default: 1).
  • --windist : Define background window as +/- windist kb of hotspot center (default: 50).
  • --hotdist : Define hotspot window as +/- hotdist kb hotspot center (default: 1.5).
  • --seed : Random seed.
  • --nofreqcond : Turn off frequency conditioning.
  • --lk-SNP-window : Number of SNPs over which to calculate the composite likelihood (default: 50).

###ldhot_summary

The ldhot_summary program takes the following parameters.

####Required Parameters:

  • --res : Input recombination rate estimates in same format as LDhat 'stat' output.
  • --hot : Input hotspot file from LDhot.

####Other Parameters:

  • --out : Prefix for output files (default: out).
  • --sig : Significance cutoff for calling a hotspot (default: 0.001).
  • --sigjoin : Significance cutoff for merging hotspot windows (default: 0.01).

About

Detect recombination hotspots using population genetic data.

http://arxiv.org/abs/1403.4264

License:GNU Lesser General Public License v3.0


Languages

Language:C++ 97.3%Language:Perl 2.0%Language:Makefile 0.7%