liuyuan-cisd/enhancer_promoter_manuscript

Guide to scripts and data for "Sequence characteristics distinguish transcribed enhancers from promoters and predict their breadth of activity"

Laura Colbran

01/15/2018

doi:10.1534/genetics.118.301895

bin/

avg_curves.py
averages ROC and PR curves from many classifiers, used when we split larger sets up

enh-prom_analyses.ipynb
contains R code for relative ROC calculation (Fig. 3), TF motif analyses (Fig. 5), PCA, kmer weights (Fig 4)

kmer_count.py
counts occurrence of all sequences of length k in a set of genomic regions

set_length.py
makes every region in a bed file the same length, keeping same center point

data/

all_fantom_enhancers.bed
    Broad enhancers = all with #tiss >45
    Context Specific = random subset of those with #tiss = 1
    regions were set to 600bp before use

all_fantom_prom.bed
    Broad Promoters = random subset of those with mean_act >372
    Context-Specific = all with mean_act <9
    regions were set to 600bp before use

roadmap_enhancers_600bp.bed
filtered, set to 600bp

prom_enh_rel_ROC.txt
values for Fig. 3 relative ROCs

roadmap_promoters_600bp.bed
filtered, set to 600bp

tf_motif_specificity.csv
FANTOM TSPS scores, IDs

classifiers/

output and scripts from all SVM classifiers
N.B. classifier script requires Python 2.7.8 and Shogun Machine Learning Toolbox v4.0.0

fantom_enhVSprom/
direct classifiers between enhancers and promoters (Fig. 1)

fantom_enhVsprom_cgiMatched/
direct classifiers between enhancers and promoters, stratified by CGI overlap

broadVSspecific/
classifiers between broad and specific regions (Fig. 2)

cgi_analyses/
stratified by CGI status (Fig. 3)

roadmap_enhVSprom/
direct classifiers between enhancers and promoters (Fig. 6)

enhVsprom_tf_matching/

tomtom output for top 6-mers in direct classifiers between enhancers and promoters (Fig 3B)

motif_sim/

tomtom output for top 6-mer in other enhancer and promoter classifiers
hocomoco/ (Fig 5)
jaspar/ (Figs S11 & S12)\

tf_counts/

overall broad and narrow tf counts in regions (Fig. 5)

About

supplementary materials for "Sequence characteristics distinguish transcribed enhancers from promoters and predict their breadth of activity"

Languages

Language:Python 82.5%Language:HTML 10.6%Language:Jupyter Notebook 6.8%