legbar / snaptron-experiments

contains code and scripts to re-create analysis function experiments from the Snaptron paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

snaptron-experiments

Contains code and scripts to re-create analyses from the Snaptron paper. Also contains a general purpose client for querying the Snaptron web services.

Requirements:

  • Python 2.7

  • R with ggplot2

Intermediate results will still be produced even if Rscript is not found in the path.

Ask questions in the project's

Join the chat at https://gitter.im/snaptron/Lobby

Analyses

You can run all three analyses + the psi and intersection functions from the paper via this script:

./run_all.sh > run_all.out 2>&1

There will be some delay (typically a few minutes) when a Snaptron compilation (e.g. GTEx) is accessed for the first time as the whole of the sample metadata will be downloaded and cached locally.

Output from the scripts is dumped in the working directory.

Intermediate data downloaded from the Snaptron web services is stored in the snaptron_tmp directory.

The SSC analysis takes several minutes to complete (~15) as it has to make more than 200 queries to both the GTEx and SRAv2 compilations.

The other two should complete within a minute.

  1. Shared sample count (SSC) Script

Input:

[HG38 Input file](data/novel_exons.raw.hg38.bed)

Output:

* novel_exons.hg38.ssc_results_srav2.tsv

* novel_exons.hg38.ssc_results_gtex.tsv

* shared_sample_counts.pdf

* The following is written to standard error at the end of each compilation's run:

	* Exons with 0 SSC

	* # of exons with > 0 SSC

	* # of exons with > 0 SSC which are fully annotated
  1. Tissue specificity (TS) Script

Input:

[HG38 Input file](data/rel_splices.hg38.snap.tsv)

Output:

* P-values writen to standard out

* rel_ts_list.tsv
  1. Junction Inclusion Ratio (JIR) Script

Input:

[HG19 Input file](data/alk_alt_tss.hg19.snap.tsv)

[GTEx HG38 Input file](data/alk_alt_tss.hg38.snap.tsv)

[TCGA HG38 Input file](data/alk_alt_tss.hg38.tcga.snap.tsv)

Output:

* alk_alt_tss.hg19.srav1.jir_results.tsv

* alk_alt_tss.hg38.gtex.jir_results.tsv

* alk_alt_tss.hg38.tcga.jir_results.tsv
  1. Percent Spliced In (PSI) Script

Input:

[HG38 Example ABCD3 Cassette Exon Input File](data/test_psi_abcd3.snap.tsv)

Output:

* test_psi_abcd3.snap.samples.tsv
  1. Intersection (conjunction) Script

Input:

[HG38 Double Query Input File](data/test_intersection2.snap.tsv)

Output:

* test_intersection2.snap.junctions.tsv

General Snaptron Client

query_snaptron.py

Examples

Run the examples script to see some of the Snaptron client's various options.

About

contains code and scripts to re-create analysis function experiments from the Snaptron paper

License:Other


Languages

Language:Python 97.7%Language:Shell 1.7%Language:R 0.6%