This repository contains an implementation of HRDetect used by Eric Y. Zhao at the Genome Sciences Centre.
HRDetect is run via a Snakemake pipeline. It depends upon a working installation of R, with numerous dependencies. These may be installed by running make dependencies
, which builds a miniconda environment with the necessary installations. To source that environment after it is installed, simply run source dependencies/miniconda3/bin/activate dependencies
prior to running the pipeline.
Depending on your system and needs, some of these dependencies may require tweaking. Notably, the dependency installer assumes a linux-based environment.
Also necessary are two in-house tools called SignIT
and hrdtools
, which can be acquired by running make
.
> make
if [ -d git/hrdtools ]; \
then(cd git/hrdtools && git pull); \
else git clone git@github.com:eyzhao/hrdtools.git git/hrdtools; \
fi
Cloning into 'git/hrdtools'...
remote: Counting objects: 97, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 97 (delta 1), reused 0 (delta 0), pack-reused 91
Receiving objects: 100% (97/97), 163.64 KiB | 821.00 KiB/s, done.
Resolving deltas: 100% (16/16), done.
if [ -d git/SignIT ]; \
then(cd git/SignIT && git pull); \
else git clone git@github.com:eyzhao/SignIT.git git/SignIT; \
fi
Cloning into 'git/SignIT'...
remote: Counting objects: 394, done.
remote: Compressing objects: 100% (113/113), done.
remote: Total 394 (delta 50), reused 58 (delta 22), pack-reused 259
Receiving objects: 100% (394/394), 1.11 MiB | 1.44 MiB/s, done.
Resolving deltas: 100% (153/153), done.
HRDetect expects data files of specific types and formats at specific locations. All files are housed at data/{project}/{subproject}/patients/{patient}/{sample}/
, where {patient}
and {sample}
together form a unique ID pair for a given sample. Within each such directory, there must be four files specific to the sample.
segments.tsv
somatic_indels.vcf
somatic_snvs.vcf
somatic_sv.tsv
If you would like to use the Snakemake pipeline as is, then you can provide a project-specific file under the projects
directory. Some projects files are already there as an example. You can then link to the project by adding a line include: "project/myproject.smk"
in Snakefile.
If you would like to construct your own pipeline structure, please feel free to use the scripts in the scripts
folder as needed.
This is a file with segmented CNV/LOH calls with at least 5 columns.
chr
: The chromosome namestart
: Start position of CNV/LOH callend
: End position of CNV/LOH callcopy_number
: The tumour copy number of the segmentlohtype
: The type of LOH state. Should be amongst the following:ASCNA
: Allele-specific copy number amplificationBCNA
: Balanced copy number amplificationHET
: Heterozygous (normal)NLOH
: Neutral LOH (loss of heterozygosity, but 2 copies present)DLOH
: Deletion LOHALOH
: Amplification LOH
A VCF file containing indels which can be parsed by R using the readVCF()
function of VariantAnnotation.
A VCF file containing SNVs which can be parsed by R using the readVCF()
function of VariantAnnotation.
A tab-delimited file with structural variant data. Should contain the following columns:
chr1
: Name of the first chromosome involvedpos1
: Coordinate of the SV breakpoint corresponding to the first chromosomechr2
: Name of the second chromosome involvedpos2
: Coordinate of the SV breakpoint corresponding to the second chromosometype
: Can take on valuesDEL
,DUP
,TRA
, orINV
. Unless the value isTRA
, the two chromosomes should be the same.
If you use HRDtools in your publication, please cite the following study: