loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Difference between --signals and --peaks

archieandrews10 opened this issue · comments

Hi,

I want to know what file to use in the below two steps;

TOBIAS ATACorrect --bam test_data/Bcell.bam --genome test_data/genome.fa.gz --peaks test_data/merged_peaks.bed --blacklist test_data/blacklist.bed --outdir ATACorrect_test --cores 8

In the above ATACorrect step, exactly what file should be used for --peaks string? The TOBIAS manual says the following; "A .bed-file containing peak regions, which are the regions of interest for doing subsequent footprinting. The peaks are used to calculate the read-in-peaks ratio for normalization." This sounds very broad. Is it an output of some other tool like MACS2/3 or Genrich?
But again, those tools provide Open chromatin/Accessible regions as peaks. They cannot be considered here, right? Because the idea is to perform TF Footprinting. Should the NarrowPeaks be used here?

In that case, what should be used under --regions for TOBIAS FootprintScores step?

TOBIAS FootprintScores --signal test_data/Bcell_corrected.bw --regions test_data/merged_peaks.bed --output Bcell_footprints.bw --cores 8

Could you clarify this?

Hey @archieandrews10,

thank your for your question!
You are right, the peak file is usually the output of such a tool. Whether you want to use broad peaks (which are for example processed in the default analysis of our TOBIAS snakemake pipeline) or narrow peaks depends a little bit on your research question. You can use the latter if you want to have a look at regions with a clearer signal.
You could also provide a list of regions that are of special interest to you, but if the ATAC signal there is too sparse (as it is in closed chromatin), you probably will not be able to find any TF footprints. Only open chromatin regions provide enough signal for the analysis.

TOBIAS basically uses these accessible regions to find small areas of low accessibility within them. As the region as a whole is open, but these small areas the size of transcription factor motifs are not, bound transcription factors can be inferred.

For TOBIAS FootprintScores --regions, you want to use the same file you used for TOBIAS ATACorrect --peaks, just as you showed in your example. TOBIAS will calculate the footprinting signal in these regions only.

I hope this clarified your question. If you are still uncertain, we can go into more detail.

Best regards,
Moritz

No activity for at least 30 days. Marking issue as stale. Stale issues are closed after one week.