ENCODE-DCC / hic-pipeline

HiC uniform processing pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for Arima Restriction Enzyme

andreevakali opened this issue · comments

Hello,

I am attempting to run the hic-pipeline with the Arima restriction enzyme.

I have created a restriction site file for the hg19 genome for the Arima enzyme using the following Juicer utility:

https://github.com/aidenlab/juicer/blob/master/misc/generate_site_positions.py

For the input to the hic-pipeline, the input json file has the following parameter:

"hic.restriction_enzyme": "Arima",

The hic.wdl workflow definition language file that is defined for the hic-pipeline has the following entries for supported enzymes:

    # Pipeline internal "global" variables: do not specify as input
    # These ligation junctions are consistent with mega.sh
    Map[String, String] RESTRICTION_ENZYME_TO_SITE = {
        "HindIII": "AAGCTAGCTT",
        "DpnII": "GATCGATC",
        "MboI": "GATCGATC",
    }

The restriction enzymes in the Arima-HiC cocktail cut at the following motifs, where ‘^’ is the cut site on the + strand (‘N’ can be either of the 4 genomic bases):


^GATC

G^ANTC

For me to add Arima-HiC support with two enzymes to the hic.wdl, how should the two enzymes be specified in the above RESTRICTION_ENZYME_TO_SITE map?

Thank you,
Kalina

Hi Kalina,

We currently don't support processing Hi-C data with multiple ligation junctions. You should still be able to produce .hic contact maps, but the ligation counts and the some of the QC will be incorrect. If that's fine with you then you can just specify "hic.restriction_enzyme": "MboI" in your input JSON and run the pipeline using the restriction site file you have generated.

We might add the ability support multiple ligation sites in the future but I can't make any promises about if or when that would happen.

I hope that helps,

Paul

Hi Paul,

Thank you so much for the clarification and the support!

Kalina