nghiavtr / FuSeq

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can value of k-mer is changeable based on read length (150nt) ?

unique379r opened this issue · comments

Hi
I have tested your tool (FuSeq) with 100nt long sequences (glioma dataset) as you have described in your paper as well as in GitHub tutorial and have compared the results with Soafuse and fusionCatcher . Seems FuSeq works well in terms Precision as of recall was almost similar.
Anyway, taking this comparison further I have in-house Sequences and 14 validated truth sets (gene-fusion) with 150nt long read length. Using your by default parameters, FuSeq was able to predict only 2 out of 14 fusion genes in compare to FusionCatcher which has predicted 10.
Any thought about it ?? I was thinking to change the -k mer values (since i have 150nt read length) but I am not able to ..."Error: k must not be larger than 31, you chose 51".

Any insight would be appreciated.
Thanks
Rupesh.

Dear Rupesh,

Thank you for using FuSeq in your research.
I have not tested FuSeq for RNA-seq with 150nt read long, so I am not sure for a correct answer. But I think k-mer length might not be an issue, 31 would be ok. I think might be other parameters can effect. If you can send the FuSeq output of your data and the list of 14 validated ones, I would investigate what's happening. If it is big, you can upload the data to somewhere and send the download link to me via email (nghiavtr@gmail.com).

Best,
Nghia

Thank you for your kind reply.
Please check your email. I sent you via aspera as package.

Dear Rupesh,

Thank you for your files!

So you are running FuSeq on RNA-seq data with150nt read long and using the annotation of Homo_sapiens.GRCh38.94. It should be noted that we have never tested FuSeq carefully either for Hg 38 and long-read RNA-seq data.

I did an investigation from the FuSeq output and discovered that most of the missing fusions are very lowly expressed. If I change the parameter setting of minScore from 3 (default) to 1, I will obtain 8 out of 14 true fusions.

 FuSeq.params$minScore=1  
  myFusionFinal.MR=FuSeq.MR.postPro$myFusionFinal
  myFusionFinal.SR=FuSeq.SR.postPro$myFusionFinal

  fragmentInfo=FuSeq.MR$fragmentInfo
  FuSeq.integration=integrateFusion(myFusionFinal.MR, myFusionFinal.SR, FuSeq.params, fragmentInfo=fragmentInfo, paralog.fc.thres=2.0)
  myFusionFinal=FuSeq.integration$myFusionFinal

Moreover, one fusion (TACC3-FGFR3) is in very short distance 48117 which is less than the default setting of FuSeq.params$minGeneDist=1e5. So if you want to get this fusion you might reduce the distance to 10000 (FuSeq.params$minGeneDist=1e4).

I hope it would help.

Best,
Nghia