Code to compute xp-clr values as per Chen, Patterson & Reich 2010. This implementation was written due to found bugs in the source of the original tool.
Clone this git repository into your working directory and cd
.
python setup.py install
Not yet available on PyPi/bioconda
This is a python module with a convenience script attached.
It is designed to run on hdf5 files representing genetic data as generated using scikit-allel.
Support is available for VCF and text files in same format as original XPCLR tool, but has not been optimised for this.
Support to be added for zarr
format soon.
Interface is under development and may change/break in future.
usage: xpclr [-h] --out OUT [--format FORMAT] [--input INPUT]
[--gdistkey GDISTKEY] [--samplesA SAMPLESA] [--samplesB SAMPLESB]
[--rrate RRATE] [--map MAP] [--popA POPA] [--popB POPB] --chr
CHROM [--ld LDCUTOFF] [--phased] [--verbose VERBOSE]
[--maxsnps MAXSNPS] [--minsnps MINSNPS] [--size SIZE]
[--start START] [--stop STOP] [--step STEP]
Tool to calculate XP-CLR as per Chen, Patterson, Reich 2010
optional arguments:
-h, --help show this help message and exit
--out OUT, -O OUT output file
--format FORMAT, -F FORMAT
input expected. One of "vcf" (default), "hdf5", or
"txt"
--input INPUT, -I INPUT
input file vcf or hdf5
--gdistkey GDISTKEY key for genetic position in variants table of hdf5/VCF
--samplesA SAMPLESA, -Sa SAMPLESA
Samples comprising population A. Comma separated list
or path to file with each ID on a line
--samplesB SAMPLESB, -Sb SAMPLESB
Samples comprising population B. Comma separated list
or path to file with each ID on a line
--rrate RRATE, -R RRATE
recombination rate per base
--map MAP input map file as per XPCLR specs
--popA POPA filepath to population A genotypes
--popB POPB filepath to population A genotypes
--chr CHROM, -C CHROM
Which contig analysis is based on
--ld LDCUTOFF, -L LDCUTOFF
LD cutoff to apply for weighting
--phased, -P whether data is phased for more precise r2 calculation
--verbose VERBOSE, -V VERBOSE
How verbose to be in logging. 10=DEBUG, 20=INFO,
30=WARN, 40=ERROR, 50=CRITICAL
--maxsnps MAXSNPS, -M MAXSNPS
max SNPs in a window
--minsnps MINSNPS, -N MINSNPS
min SNPs in a window
--size SIZE window size in base pairs
--start START start base position for windows
--stop STOP stop base position for windows
--step STEP step size for sliding windows
hdf5
as generated from vcf
by scikit-allel.
Or
vcf
via scikit-allel
Or
.geno
Space delimited file, containing 0/1s with sample haplotypes as columns, and rows as SNPs.
.map
Space delimited file. 6 columns: ID, chromosome, Genetic Distance, Position, REF, ALT.
For examples of these files see the fixture folder used for testing.