loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BINDetect not giving out error when the motif file is "deformed"

johannesnicolaus opened this issue · comments

Might be a continuation of issue #78. When I tried to run BINDetect using "pfm" motif file created by gimmemotifs, i get a problem where

The pfm file looks something like:

>GM.5.0.Sox.0001
0.7213  0.0793  0.1103  0.0891
0.9259  0.0072  0.0062  0.0607
0.0048  0.9203  0.0077  0.0672
0.9859  0.0030  0.0030  0.0081
0.9778  0.0043  0.0128  0.0051
0.1484  0.0050  0.0168  0.8299
>GM.5.0.Homeodomain.0001
0.8870  0.0000  0.0178  0.0951
0.1156  0.2033  0.6629  0.0181
0.0017  0.7452  0.0809  0.1722
0.0011  0.0003  0.0003  0.9983
0.0026  0.0141  0.9721  0.0111
0.0000  0.0189  0.0054  0.9758
0.0006  0.9983  0.0006  0.0006
0.9170  0.0140  0.0046  0.0644
0.2228  0.2421  0.3300  0.2051
0.3621  0.1054  0.2208  0.3116
0.5727  0.0104  0.1741  0.2428

For example, I have 1796 motifs in the pfm file, but I got the following warning:

2023-12-16 10:23:46 (1569572) [INFO]	Reading motifs from file
2023-12-16 10:23:47 (1569572) [INFO]	- Read 5531 motifs
2023-12-16 10:23:47 (1569572) [WARNING]	The motif output names (as given by --naming) are not unique.
2023-12-16 10:23:47 (1569572) [WARNING]	The following names occur more than once: ['_']
2023-12-16 10:23:47 (1569572) [WARNING]	These motifs will be renamed with '_1', '_2' etc. To prevent this renaming, please make the names of the input --motifs unique

And I got results with the directories named as such:

__1     __1413  __1829  __2243  __2659  __3073  __3489  __541  __957

or

GM.5.0.Sox.0001_GM.5.0.Sox.0001
GM.5.0.Sox.0002_GM.5.0.Sox.0002
GM.5.0.Sox.0003_GM.5.0.Sox.0003
GM.5.0.Sox.0004_GM.5.0.Sox.0004
GM.5.0.Sox.0005_GM.5.0.Sox.0005
GM.5.0.Sox.0006_GM.5.0.Sox.0006
GM.5.0.Sox.0007_GM.5.0.Sox.0007
GM.5.0.Sox.0008_GM.5.0.Sox.0008
GM.5.0.Sox.0009_GM.5.0.Sox.0009

Maybe this pfm file is not a standard pfm file, but maybe it would be nice if BINDetect gives an error that the motif file is not standard.

My current workaround is to run chen2meme, because it seems that it is a chen motif file. Now BINDetect seems to work fine.

Hi @johannesnicolaus

Thank you for this issue - indeed it looks related to #78. There seems to be a bug in the reading of these files using biopython, which creates additional "empty" motifs with "_"-names. We have now changed it to manually parse and check the length, and will then write an error in case a deformed motif is found:
image

The code is not thoroughly tested yet, but you can have a look already by installing the version directly from the dev branch as:
pip install git+https://github.com/loosolab/TOBIAS@dev

After testing, the functionality will be included in the next version of TOBIAS. Hope that helps 🙏

Perfect, thanks so much!

No activity for at least 30 days. Marking issue as stale. Stale issues are closed after one week.