ENCODE Imputation Challenge Scoring/Ranking and Validation Scripts

Installation

Install Conda 4.6.14 first. Answer yes to all Y/N questions. Use default installation paths. Re-login after installation.
```
$ wget https://repo.anaconda.com/miniconda/Miniconda3-4.6.14-Linux-x86_64.sh
$ bash Miniconda3-4.6.14-Linux-x86_64.sh
```

Install numpy, scikit-learn and pyBigWig.

$ conda install -y -c bioconda numpy scikit-learn pyBigWig sqlite scipy

Validating a submission

$ python validate.py [YOUR_SUBMISSION_BIGWIG]

Scoring a submission

Download ENCFF622DXZ and ENCFF074VQD from ENCODE portal.

$ mkdir -p test/hg38 && cd test/hg38
$ wget https://www.encodeproject.org/files/ENCFF622DXZ/@@download/ENCFF622DXZ.bigWig
$ wget https://www.encodeproject.org/files/ENCFF074VQD/@@download/ENCFF074VQD.bigWig

Convert it to numpy array. This is to speed up scoring multiple submissions. score.py can also take bigwigs so you can skip this step.
```
$ python bw_to_npy.py test/hg38/ENCFF622DXZ.bigWig
$ python bw_to_npy.py test/hg38/ENCFF074VQD.bigWig
```
Run it. If you score without a variance .npy file specified as --var-npy, then msevar metric will be 0.0.
```
$ python score.py test/hg38/ENCFF622DXZ.npy test/hg38/ENCFF074VQD.npy --chrom chr20
```

Ranking for submissions

Create a score database.
```
$ python db.py [NEW_SCORE_DB_FILE]
```
In order to speed up scoring, convert TRUTH_BIGWIG into numpy array/object (binned at 25). Repeat this for each pair of cell type and assay. --out-npy-prefix [TRUTH_NPY_PREFIX] is optional. Repeat this for all truth bigwigs.
```
$ python bw_to_npy.py [TRUTH_BIGWIG] --out-npy-prefix [TRUTH_NPY_PREFIX]
```
For each assay type, build a variance .npy file, which calculates a variance for each bin for each chromosome across all cell types. Without this variance file, msevar will be 0.0.
```
$ python build_var_npy.py [TRUTH_NPY_CELL1] [TRUTH_NPY_CELL2] ... --out-npy-prefix var_[ASSAY_OR_MARK_ID]
```
Score each submission. --validated is only for a validated bigwig submission binned at 25. With this flag turned on, score.py will skip interpolation of intervals in a bigwig. For ranking, you need to define metadata for a submission like -t [TEAM_ID_INT] -s [SUBMISSION_ID_INT]`. These values will be written to a database file together with bootstrap scores. Repeat this for each submission (one submission per team for each pair of cell type and assay).
```
$ python score.py [YOUR_VALIDATED_SUBMISSION_BIGWIG_OR_NPY] [TRUTH_NPY] \
    --var-npy var_[ASSAY_OR_MARK_ID].npy \
	--db-file [SCORE_DB_FILE] \
	--validated \
	-t [TEAM_ID_INT] -s [SUBMISSION_ID_INT]
```
Calculate ranks based on DB file
```
$ python rank.py [SCORE_DB_FILE]
```

Setting up a leaderboard server (admins only)

Create a server instance on AWS.
Install Synapse client.
```
$ pip install synapseclient
```

Authenticate yourself on the server

$ synapse login --remember-me -u [USERNAME] -p [PASSWORD]

Create a score database.
```
$ python db.py [NEW_SCORE_DB_FILE]
```

Run score_leaderboard.py. Files on TRUTH_NPY_DIR should be like CXXMYY.npy. Files on VAR_NPY_DIR should be like var_MYY.npy. Submissions will be downloaded on SUBMISSION_DOWNLOAD_DIR.

$ NTH=3  # number of threads to parallelize bootstrap scoring
$ python score_leaderboard.py [EVALUATION_QUEUE_ID] [TRUTH_NPY_DIR] \
    --var-npy-dir [VAR_NPY_DIR] \
    --submission-dir [SUBMISSION_DOWNLOAD_DIR] \
    --send-msg-to-admin \
    --send-msg-to-user \
    --db-file [SCORE_DB_FILE] \
    --nth $NTH \
    --project-id [SYNAPSE_PROJECT_ID] \
    --leaderboard-wiki-id [LEADERBOARD_WIKI_ID] \
    --bootstrap-chrom chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr6,chr7,chr8,chr9,chrX chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr16,chr17,chr18,chr19,chr2,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chrX chr1,chr10,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr19,chr2,chr20,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX

Example:

$ python score_leaderboard.py $EVAL_Q_ID /mnt/imputation-challenge/output/score_robust_min_max/validation_data_npys --var-npy-dir /mnt/imputation-challenge/output/score_robust_min_max/var_npys --submission-dir /mnt/imputation-challenge/data/submissions/round2 --db-file $DB --nth $NTH --bootstrap-chrom chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr4,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr5,chr6,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr6,chr7,chr8,chr9,chrX chr1,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr7,chr8,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr9,chrX chr1,chr10,chr11,chr12,chr13,chr14,chr16,chr17,chr18,chr19,chr2,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chrX chr1,chr10,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr2,chr20,chr21,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9 chr1,chr10,chr11,chr12,chr13,chr14,chr15,chr19,chr2,chr20,chr22,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chrX --send-msg-to-admin --send-msg-to-user --team-name-tsv data/team_name_round1.tsv

ENCODE-DCC / imputation_challenge