ChenFengling / HiCpipe

an efficient Hi-C data processing pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HiCpipe LogoHiCpipe – user guide

general description of the pipeline

This pipeline is for BL-Hi-C.It is based on Juicer and HiC-pro which combines the advatages of these two processing pipelines. HiCpipe is much faster than Juicer and HiC-pro and can output multile features of Hi-C maps. The main.sh will trim the Linker of BL-Hi-C and map the data to certein genome. Then it will use the subjob.sh script to do the other steps in parallel in shell background. You could use top or htop to check your running program.

The outputs is listed as following:

name software output content
mapping bwa merged mapped reads(.bam)
filter HiC-pro contact pairs (.txt)
pair2hic juicer (pre) compressed Hi-C maps(.hic)
hic2map juicer(dump) sparse and dense matrix (.mat)
compartmet R eigen PC1 values(.txt, .bw)
TAD Insulation score TAD boundaries(.bed); insulation score(.bw)
CDB HiCDB CDBs(.bed); relative insulation score(.bw)
loop HiCloop loops(.bedpe)
qc shell Hi-C quality report

  Here is the general features of HiCpipe software.(in developing)

Other utility:
Easy clustering based on compartment and insulation.
Statistics of Hi-C features.

pipeline install

All software metioned before should be installed first. To install this pipeline, simply download this pipeline and use the shell script.

git clone https://github.com/ChenFengling/HiCpipe.git

input data  

Organize your data as PROJECT_PATH/sample/sample.fq.gz, for example

BLHiC-project1
├── sample1
│     ├── sample1_R1.fq.gz
│     └── sample1_R2.fq.gz
└── sample2
    ├── sample2_R1.fq.gz
    └── sample2_R2.fq.gz

output data

You will get the summarized data in PROJECT_PATH/all_results/

quick start

use the following code to analyse your BL-HiC data

sh main.sh $PROJECT_PATH $Resolution $genome $core $HiCpipe_PATH

Configurations should be changed in config-hicpro_*.txt: BOWTIE2_IDX_PATH GENOME_SIZE GENOME_FRAGMENT.

how to use other genomes rather than mm9/mm10/hg19

1.change tss annotation in compartment.r

tss=read.table("YOUR_TSS_FOLDER/tss.bed")

2.change BOWTIE2_IDX_PATH GENOME_SIZE GENOME_FRAGMENT in config-hicpro.txt follow the instrcution in https://github.com/nservant/HiC-Pro/tree/master/annotation to generate the sites of restriction enzyme.

/home/software/HiC-Pro/bin/utils/digest_genome.py -r GG^CC -o mm9_ggcc.bed /home/reference/mouse/mm9/Sequence/BWAIndex/genome.fa

generate QC report

Use HiCqc.sh to generate Hi-C qc report

sh HiCqc.sh $PROJECT_PATH $REPORT_NAME $HiCpipe_PATH

You will find the qc report REPORT_NAME_report.txt under PROJECT_PATH.

QC output

QC suggestion

Valid_interaction_pairs/Total_PETsTotal_PETs (>50%)
valid_interaction_rmdup/Valid_interaction_pairs (>85%)
cis_interaction/trans_interaction (>1.5)

citation

HiCDB paper
Chen, F., Li, G., Zhang, M. Q., & Chen, Y. (2018). HiCDB: a sensitive and robust method for detecting contact domain boundaries. Nucleic acids research, 46(21), 11239-11250.
T-ALL paper using HiCpipe
Yang, L., Chen, F., Zhu, H., Chen, Y., Dong, B., Shi, M., ... & Zhang, M. Q. (2020). 3D Genome Analysis Identifies Enhancer Hijacking Mechanism for High-Risk Factors in Human T-Lineage Acute Lymphoblastic Leukemia. bioRxiv.

related links

ChIA-PET2 https://github.com/GuipengLi/ChIA-PET2
Hi-Cpro sample
Hi-Cpro
Juicer tools pre https://github.com/theaidenlab/juicer/wiki/Pre#4dn-dcic-format
juicerbox https://github.com/theaidenlab/Juicebox
video for Juicebox usage
cnv and transloctaion tools: HiCtrans HiCnv  HiCapp

About

an efficient Hi-C data processing pipeline


Languages

Language:MATLAB 45.7%Language:Shell 28.0%Language:R 22.8%Language:Python 3.5%