UoS-HGIG / GenePy-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GenePy-2.0 is gene- and region-based pathogenecity score for each individual based the carrier status of genomic variants; major updates from GenePy-1.x is 1). handling of multi-allelic loci with number of alternative alleles up to 10; 2). region-based score is encoporated; 3). computational efficiency improved with GPU-based processing option available

Installation pre-requirement:

1). Ensemble VEP

2). CADD >= 1.6

3). Python3

3). numpy==1.26.4

4). pandas==2.2.1

5). pyarrow==15.0.2

6). numba==0.59.1

7). bedtools

8). Bcftools >=1.3.1

Reference file is needed for the VEP annotation and gene delegation for the calculation of gene-based GenePy score, the the bed file of user defined target regions for region-based GenePy score. Gene-based delegation can be based on the gene region, or the CCDS-based region. CCDS-based gene delegation can be more appropriate for whole exome sequencing analysis. However, if the user's focus is on functional variants, e.g. those with CADD phred score >=15 or 20, the difference is minimal as shown by the figure below based on the Agilent SureSelect V5/6 capture kit. image

Running GenePy is by running the python make_scores_mat.py, and options can be found by -h; the input meta file is the annotated variant file from vcf. Conversion from vcf to meta file can be achieved by the two pre_processing scripts:

1). pre_1.sh adds annotation including the CADD score and the allele frequency followed by quality control of the vcf

./pre_1.sh input.vcf > out.vcf

2). pre_local converts the vcf to the meta file for GenePy score calculation

./pre_local.sh out.vcf This will generate 3 meta files by default, the CADDALL, CADD15, CADD20, each represent the meta file for all variants, variants with CADD_phred score >=15 and >=20.

Following this the GenePy score can be generated by:

python make_scores_mat.py --gene ${GENE} --cadd ${CADD-cutoff}

An example of an annotated vcf file and the corresponding meta file is provided in the example/ folder.

About

License:MIT License


Languages

Language:Shell 46.7%Language:Nextflow 27.5%Language:Python 25.8%