JinLabBioinfo / DeepLoop

DeepLoop robustly identifies enhancer-promoter interactions from low-depth and single-cell Hi-C data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DeepLoop

The conceptual innovation of DeepLoop is to handle systematic biases and random noises separately: we used HiCorr to improve the rigor of bias correction, and then applied deep-learning techniques for noise reduction and loop signal enhancement. DeepLoop significantly improves the sensitivity, robustness, and quantitation of Hi-C loop analyses, and can be used to reanalyze most published low-depth Hi-C datasets.

Citation: Zhang, S., Plummer, D., Lu, L. et al. DeepLoop robustly maps chromatin interactions from sparse allele-resolved or single-cell Hi-C data at kilobase resolution. Nat Genet 54, 1013–1025 (2022). https://doi.org/10.1038/s41588-022-01116-w

DeepLoop contains two parts:

  • LoopDenoise removes noise from HiCorr bias-corrected Hi-C data (>300M cis-2Mb contacts)
  • LoopEnhance reveals chromatin loops from lower depth Hi-C data (<300M cis-2Mb contacts, e.g. single cell HiC data and allele-resolved Hi-C data)

DeepLoop require HiCorr, please install HiCorr first

Processed data availability

  • 40 Processed Hi-C datasets by DeepLoop can be visualized in website
  • Top300K loop pixels for datasets analyzed in this project can be downloaded by:
    wget --no-check-certificate https://hiview.case.edu/ssz20/tmp.HiCorr.ref/DeepLoop_top300K.tar.gz

Installation

DeepLoop was developed and tested using Python 3.5 and following Python packages:

  • numpy
  • scipy
  • pandas
  • matplotlib
  • opencv-python
  • tensorflow

The packages can be installed by running the following command: pip3 install -r requirements.txt This will also install optional visualization and analysis tools we use such as:

  • cooler
  • jupyter
  • higlass-python

If you plan on training your own model you will want to use a GPU enabled version of TensorFlow to intractably long training times. We used tensorflow-gpu==2.3.1 but any TF2 version should work. For prediction GPU is not necessary but it will be faster than using CPU.

Download DeepLoop trained models and reference files

cd DeepLoop/
wget --no-check-certificate https://hiview.case.edu/ssz20/tmp.HiCorr.ref/DeepLoop_models.tar.gz
tar -xvf DeepLoop_models.tar.gz

After decompressing, the "DeepLoop_models/" dircetory includes "CPGZ_trained", "H9_trained" models and "ref" which includes anchor bed files for HiCorr output.

Run DeepLoop

There are three steps to process Hi-C data from fastq files:

  • Step1: Mapping and filterations: fastq to fragment pairs
  • Step2: Bias correction by HiCorr: fragment pairs to ~5kb anchor/bin pairs
  • Step3: Denoise or Enhance by DeepLoop: ~5kb anchor/bin pairs

For step1 and step2, examples and scripts are available in HiCorr.

  • Make sure you have HiCorr output before you run DeepLoop. One example HiCorr output data is provided:
wget http://hiview.case.edu/ssz20/tmp.HiCorr.ref/HiCorr_test_data/HiCorr_output.tar.gz 
tar -xvf HiCorr_output.tar.gz
ls
ls HiCorr_output

You will see "anchor_2_anchor.loop.chr11" and "anchor_2_anchor.loop.chr11.p_val" in "HiCorr_output/", the difference between two files is that one has p-values, the other does not. There's no p-value column for beta version of HiCorr on micro-C and Arima HiC data. Remember remove "pval" in DeepLoop paramter "--input_name" and "--val_cols" later
The data format is:

anchor_id_1 anchor_id_2 observed_reads_count expected_reads_from_HiCorr
  • Run DeepLoop based on directory "HiCorr_output/", more details are inprediction walkthrough notebook
    We provided both LoopDenoise and many LoopEnhance models trained by variable depth, choose either LoopDenoise or the LoopEnhance model that matches your data depth (cis-2Mb contacts). If you decide using LoopEnhance models to enhance your data signal, suggesting try a few models to selet the best one fitting your data.
HiCorr_path=<Path to HiCorr_output>
DeepLoop_outPath=
chr=chr11
python3 DeepLoop/prediction/predict_chromosome.py --full_matrix_dir $HiCorr_path/ \
                                              --input_name anchor_2_anchor.loop.$chr.p_val \
                                              --h5_file DeepLoop/DeepLoop_models/CPGZ_trained/LoopDenoise.h5 \
                                              --out_dir $DeepLoop_outPath/ \
                                              --anchor_dir DeepLoop/DeepLoop_models/ref/hg19_HindIII_anchor_bed/ \
                                              --chromosome $chr \
                                              --small_matrix_size 128 \
                                              --step_size 128 \
                                              --dummy 5 \
                                              --val_cols obs exp pval

Check output in $DeepLoop_outPath

ls $DeepLoop_outPath
head $DeepLoop_outPath/$chr.denoised.anchor.to.anchor

You will see "chr11.denoised.anchor.to.anchor"

anchor_id_1 anchor_id_2 LoopStrength_from_DeepLoop
  • Visulaize contact heatmaps from raw, HiCorr, and DeepLoop given a genomic location chr start end
chr=chr11
start=130000000
end=130800000
outplot="./test"
./DeepLoop/lib/generate.matrix.from_HiCorr.pl DeepLoop/DeepLoop_models/ref/hg19_HindIII_anchor_bed/$chr.bed $HiCorr_path/anchor_2_anchor.loop.$chr $chr $start $end ./${chr}_${start}_${end}
./DeepLoop/lib/generate.matrix.from_DeepLoop.pl DeepLoop/DeepLoop_models/ref/hg19_HindIII_anchor_bed/$chr.bed $DeepLoop_outPath/$chr.denoised.anchor.to.anchor $chr $start $end ./${chr}_${start}_${end}
./DeepLoop/lib/plot.multiple.r $outplot 1 3 ${chr}_${start}_${end}.raw.matrix ${chr}_${start}_${end}.ratio.matrix ${chr}_${start}_${end}.denoise.matrix
https://github.com/JinLabBioinfo/DeepLoop/blob/master/images/test.plot.png

Check the "test.plot.png", "raw", "HiCorr", and "DeepLoop"
sample heatmaps

Note:

  • Change DeepLoop model according to the data depth you have
  • To run DeepLoop for whole genome, repeat the process above for each chromosome.
  • The heatmap color scale can be adjusted in the script "lib/plot.multiple.r"

Heatmap Visualization for HiCorr and DeepLoop output

The heatmap visualization in Step3 above can be also done with script "plot.sh" in "lib/"
It takes eight parameters:

  • DeepLoopInstallPath: Path to "DeepLoop/"
  • DeepLoopRefbed: Path to anchor bed, e.g. "DeepLoop/DeepLoop_models/ref/hg19_HindIII_anchor_bed/" in the test exmaples
  • HiCorr_path: Path for HiCorr_output/.
  • DeepLoop_outPath: Path for DeepLoop output; where you store "chr*.denoised.anchor.to.anchor"
  • chr: Genomic region chromosome
  • start: Genomic region start loc
  • end: Genomic region end loc, remember input a region less than 2Mb
  • outPath: Path to store the heatmap png files.

If DeepLoop is installed in home directory "$myhome", outPath is current directory("./") you plan to run the script

bash $myhome/DeepLoop/lib/plot.sh $myhome \
                                  $myhome/DeepLoop/DeepLoop_models/ref/hg19_HindIII_anchor_bed/ \
                                  $HiCorr_path/ \
                                  $DeepLoop_outPath/ \
                                  chr11 130000000 130800000 ./ 

The heatmap png file named "chr11_130000000_130800000.plot.png" will be in the current directory.

Compatible with HiC-Pro, Cooler and HiGlass

  • As we mentioned in section Hi-C data Preprocessing, HiCorr can take HiC-Pro output.
  • The output of HiCorr and Deeploop can be converted to cooler format, see cooler walkthrough notebook
  • You can further take converted cooler file to visulaize by HiGlass

Training new models

If you wish to train a new model, ensure you have access to a machine with a GPU and refer to the training walkthrough notebook

About Loopcalling

DeepLoop is able to generate clean loop signals, we We will merge DeepLoop output from all the chromosomes and rank anchor pairs by "LoopStrength"(3rd column). Take confident loops from top ranked contact pairs.

About

DeepLoop robustly identifies enhancer-promoter interactions from low-depth and single-cell Hi-C data

License:Other


Languages

Language:Python 72.7%Language:Perl 24.4%Language:R 1.7%Language:Shell 1.2%