Comparability of footprint scores

Question

Comparability of footprint scores

rejo27 opened this issue 2 months ago · comments

Hi,
First of all, thank you for the TOBIAS. This is a powerful tool.
I now have multiple time-sequential ATAC-seq data (cond_A, cond_B, cond_C, cond_D). I run the tools in order ATACorrect -> ScoreBigwig -> BINDetect with the merge peak (cat *peak.bed > merge.bed) in single mode. I have the following problems:

In the different cond, is it possible to compare the footprint scores of the same TF on the same TFBS?
Can I understand it this way: In the same cond, the same TF has the largest footprint score on each TFBS, the greater the possibility of this TF bound in this TFBS? If not, what does a bigger score mean?
Same as 2, different TFs?
Thanks.
Looking forward to your reply.

hschult · Answer 1 · Wed Jun 26 2024 18:11:15 GMT+0800 (China Standard Time)

Hi @rejo27,
and thank you for your interest in TOBIAS.
Unfortunately, I need more information to help you. For example, I'm unsure what you mean by "single mode". Did you run BINDetect for each time point individually? In general, providing the actual tool calls would help. However, before you do that let me recommend our TOBIAS snakemake pipeline. This pipeline does a full TOBIAS analysis by calling the TOBIAS tools in order so you don't have to worry about calling each of the tools yourself.

Regarding your questions:

Yes, this is what BINDetect will do when you provide multiple conditions (see bindetect_figures.pdf).
& 3. I am not sure what score you are referring to. You can have a look here to get an explanation of the BINDetect outputs. From a scoring perspective, BINDetect considers two things: 1. It finds possible binding locations for a TF by scoring how well a TF-motif aligns to a location. 2. It predicts if the predicted TF binding locations have a bound protein (footprint). This is done with the scoring provided by ScoreBigwig.

I hope this answers some of your questions. And again I want to recommend our TOBIAS pipeline which may help you to fix some of your questions/issues.

Please let me know how it went.
Best wishes,
Hendrik

rejo27 · Answer 2 · Thu Jul 04 2024 23:35:08 GMT+0800 (China Standard Time)

I just came back from a business trip, so I'm sorry for replying to you so late.
I didn't explain my problem clearly, I will explain it with a practical example.

Yes, what I mean by "single mode" is to run it at each time point individually, such as

for treat in cond_A cond_B cond_C cond_D
do 
    TOBIAS BINDetect \
    --motifs ${motifMeme} \
    --signals 02-FootprintScores/${treat}_footprints.bw \
    --genome ${genome} \
    --peaks ${ConsensusPeak} \
    --peak_header 00-sample_ConsensusPeakDir/${treat}/peaks_annotated_header.txt \
    --outdir 03-BINDetect/${treat} \
    --cond_names ${treat} \
    --cores 8 > ./03-BINDetect/BINDetect.${treat}.log 2>&1
done

The following results are output for the two time points (cond_A, cond_B)

$ head AT5G67000_cond_A_bound.bed
chr1    20278   20288   AT5G67000      8.25197 -       chr1    19962   23171   1.28774
$ head AT5G67000_cond_B_bound.bed
chr1    20278   20288   AT5G67000      8.25197 -       chr1    19962   23171   1.10002

Because 1.28774 > 1.10002, does it mean that in cond_A, the binding ability of AT5G67000 at chr1 20278 20288 is greater than that of cond_B?

The following results are output for the cond_A

$ head AT5G67000_cond_A_bound.bed
chr1    20278   20288   AT5G67000      8.25197 -       chr1    19962   23171   1.10002
chr1    21190   21200   AT5G67000      10.5598 -       chr1    19962   23171   2.49043

In cond_A, it is predicted that AT5G67000 has two binding sites, chr1:20278-20288 and chr1:21190-21200. Since 2.49043 > 1.10002, does it mean that the binding ability of AT5G67000 at chr1:21190-21200 is greater than that at chr1:20278-20288?

Thanks.
Looking forward to your reply.

hschult · Answer 3 · Mon Jul 08 2024 15:05:15 GMT+0800 (China Standard Time)

Thank you this made it very clear!

No, the way you run BINDetect, comparing scores between timepoints is impossible. BINDetect does a quantile normalization to enable comparison between conditions. However, since you ran it in "single-mode" it does not know about more than one time point hence it can not do the normalization. For more details here is the supplementary information from our paper. There is a description of how BINDetect (page 11) and other TOBIAS tools work. Once you rerun BINDetect with all time points it will be possible to compare the scores.
Yes, that is correct. A higher score means that this location shows a better pronounced footprint compared to locations with a lower score.

rejo27 · Answer 4 · Mon Jul 08 2024 16:50:45 GMT+0800 (China Standard Time)

Thank you so much!