This repository contains the commands used to perform all the queries featured in the paper.
- BCFtools v1.13
- SnpSift v5.0e. Depends on:
- Java 1.8.0
- GEMINI v0.20.1. Depends on:
- Python 2.7.15
- Hail v0.2.105-acd89e80c345, which requires:
- Python 3.7.12
- Apache Spark 3.1.3 (installed with Hail)
- OpenCGA v2.1.0
We have provided a Dockerfile
which can be built as follows:
docker build -t gdsq:latest .
Building the Docker image can take some time, depending on connection speed and CPU. We found that it took around 25 minutes to build the Docker image.
To run the image:
docker run --rm -it gdsq:latest
All tools are available except OpenCGA as it requires some additional setup that makes it impossible to include in a Docker container. The container starts in the py3
environment so Hail can be run along with other tools. GEMINI is installed in the py2
environment but can be used even from the py3
environment thanks to its alias.
The query statement is
Return all information for a variant given its unique rsID
The related files are:
-
eqd_sampling.tsv
: contains a list of SNPs which are roughly equidistant to one another in the genome. These SNPs are queried for using the following tools:- Bcftools
run_bcftools.sh
run_bcftools_wchr.sh
- SnpSift
run_snpsift.sh
run_snpsift_wchr.sh
- Hail
run_hail.sh
(withhail-run.py
)run_hail_wchr.sh
(withhail-run.py
)
- Gemini
run_gemini.sh
run_gemini_wchr.sh
- OpenCGA
run_opencga.sh
run_opencga_wchr.sh
- Bcftools
The query statement is
Get all variants typed INDEL in chromosome 5
The related files are:
indel.sh
hail_run_indel.py
The query statement is
Retrieve sites where all samples have the homozygous genotype
The related files are:
HOM.sh
hail_run_HOM.py
The query statement is
Retrieve the variants where the allele frequency of patients is below or equal to 40% and the allele frequency of controls is above 40%
The related files are:
- Bcftools
scenario_4/run_bcftools_ann.sh
for annotation.scenario_4/run_bcftools.sh
for queries.
- SnpSift
scenario_4/run_snpsift.sh
.
- Hail
run_prep_hail.py
to create Matrix Tables.hail_query.py
to run the query in Hail.run_hail.sh
to loop through files and
- Gemini
scenario_4/sql_generator.py
for SQLite column generation.scenario_4/run_gemini.sh
for queries.
- OpenCGA
scenario_4/run_prep_opencga.sh
.scenario_4/run_opencga.sh
.