lbcb-sci / RiNALMo

RiboNucleic Acid (RNA) Language Model

Home Page:https://sikic-lab.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cluster

ylzdmm opened this issue · comments

Hello,
I would like to ask how you use MMSeqs2 to cluster RNA, such as what values ​​are set to parameters such as identity and coverage.
Thanks!

Hi,
we collected non-coding RNA sequences from publicly available datasets RNAcentral, nt, Rfam and Ensembl. We removed sequence duplicates with seqkit rmdup and the resulting unique sequences were clustered with mmseqs easy-linclust with options -{}-min-seq-id 0.7 and -c 0.8.

I hope this will help.