In order to run the docker image we provide for RV-NPL you should have Docker installed on your machine. Then you can pull the docker image and set it as a commandline alias:
docker pull statisticalgenetics/rvnpl
alias rvnpl='docker run --rm --security-opt label:disable -t '\
'-P -h RV-NPL -w $PWD -v /tmp:/tmp -v $PWD:$PWD '\
'-u $UID:${GROUPS[0]} -e HOME=/seqlink -e USER=$USER statisticalgenetics/rvnpl rvnpl'
You should now be able to see program options using the following command:
rvnpl --help
You can optionally add the alias rvnpl ...
line to your bash profile ~/.bashrc
or ~/.bash_profile
so you will have access to it next time you open a command terminal.
-
python: 2.7
-
Anaconda: >= 2.3
-
gcc-5, g++-5
-
boost_python
The RV-NPL package is free and available on github. Run the following commands to download and install the SEQLinkage dependency:
git clone https://github.com/gaow/SEQLinkage.git
cd SEQLinkage
python setup.py install
then RV-NPL,
git clone https://github.com/statgenetics/rvnpl.git
cd rvnpl
python setup.py install
If the program is installed correctly, you will see program options using the following command:
rvnpl --help
The pedigree file (PED file) is a white-space (space or tab) delimited file without header. The first six columns are mandatory:
- Family ID
- Individual ID: must be unique within the family
- Paternal ID: 0 if not available
- Maternal ID: 0 if not available
- Sex: 1=male, 2=female
- Phenotype: 1=unaffected, 2=affected; numerical values (standardized) for quantitative traits
An example pedigree file is given below:
11000 11000.fa 0 0 1 1
11000 11000.mo 0 0 2 1
11000 11000.p1 11000.fa 11000.mo 1 2
11000 11000.s1 11000.fa 11000.mo 2 1
11001 11001.fa 0 0 1 1
11001 11001.mo 0 0 2 1
11001 11001.p1 11001.fa 11001.mo 1 2
11002 11002.fa 0 0 1 1
11002 11002.mo 0 0 2 1
11002 11002.p1 11002.fa 11002.mo 2 2
The VCF file should contain variants for individuals corresponding to the PED file.
bgzip ./rep1.vcf
tabix -p vcf ./rep1.vcf.gz
If you want to analyze only subset of variants into CHP markers, you can provide a selected variant file with each row representing one variant (chromosome and position). For example:
19 58858740
19 58858782
19 58858787
When analyzing families from different ethnic populations, you need to provide this file to indicate the frequency column for each family. The frequency column should be the corresponding value under "INFO" in the VCF file. For example:
11000 gnomAD_NFE
11001 gnomAD_NFE
11002 gnomAD_AMR
optional arguments:
-h, --help show this help message and exit
Collapsed haplotype pattern method arguments:
-b FILE, --blueprint FILE
Blueprint file that defines regional marker (format:
"chr startpos endpos name avg.distance male.distance
female.distance").
--single-markers Use single variant markers. This switch will overwrite
"--bin" and "--blueprint" arguments.
Input / output options:
--fam FILE Input pedigree and phenotype information in FAM
format.
--vcf FILE Input VCF file, bgzipped.
--freq INFO Info field name for allele frequency in VCF file.
--freq_by_fam FILE Per family info field name for allele frequency in VCF
file.
--mle Estimate allele frequency from sample
--rvhaplo Only using rare variants for haplotyping
-c P, --maf-cutoff P MAF cutoff to define "common" variants to be excluded
from analyses.
--include_vars FILE Variants to be included in CHP construction
--chrom-prefix STRING
Prefix to chromosome name in VCF file if applicable,
e.g. "chr".
-o Name, --output Name
Output name prefix.
Runtime arguments:
-j N, --jobs N Number of CPUs to use.
-q, --quiet Disable the display of runtime MESSAGE.
Example commands are shown below:
cd example
rvnpl collapse --fam 100extend_01.ped --vcf A1BG/rep1.vcf.gz --output ./rep1 --freq EVSMAF -c 0.01 --rvhaplo --include_vars A1BG.txt
OR (for families with quantitative traits)
rvnpl collapse --fam 100extend_quant.ped --vcf A1BG/rep1.vcf.gz --output ./rep1 --freq EVSMAF -c 0.01 --rvhaplo --include_vars A1BG.txt
optional arguments:
-h, --help show this help message and exit
Options for doing analysis on CHP markers(default) or SNV markers:
--snv Calculate on SNV markers
Input/Output options:
--path PATH Path for input pedigree information.
--output PATH Path for output files
--n_jobs N number of multiprocess
Options for calculating p-values:
--exact get the exact distribution of Z-score and calculate p
value from it
--cut FLOAT, -c FLOAT
threshold for adaptive permutations
--rep N times of permutations
--force keep permutation times unchanged
--perfect_max N maximum for inheritance vector iterations
--info_only include only informative families
--perfect use perfect data approximation in calculating Z-score
--kc use Kong&Cox(1997) extension for analytical p-values
--sall enable calculation of NPL-all
--rvibd calculate IBD for RV only
Example commands are shown below:
rvnpl npl --path ./rep1 --output ./rep1 --exact --info_only --perfect --sall --rvibd --n_jobs 8 -c 0.001 --rep 2000000
optional arguments:
-h, --help show this help message and exit
Input/Output options:
--path PATH Path for input pedigree information.
--output PATH Path for output files
--n_jobs N number of multiprocess
Options for calculating p-values:
--exact get the exact distribution of Z-score and calculate p
value from it
--pheno shuffle phenotypes
--cut FLOAT, -c FLOAT
threshold for adaptive permutations
--rep N times of permutations
--fam_rep N times of permutations for each family
--force keep permutation times unchanged
--perfect_max N maximum for inheritance vector iterations
--rvibd calculate IBD for RV only
Example commands are shown below:
rvnpl qnpl --path ./rep1/ --output ./rep1/ --n_jobs 4 --exact --rvibd
The output is located in the given folder.
If you have any further questions, please fell free to create a issue ticket in github.