thoughts on STAR development team on splice/unsplice/ambigiuous reads classification.
mortunco opened this issue · comments
Dear @alexdobin
This is not an issue but rather asking your thoughts about recent manuscripts about splice/unsplice/ambiguous read classification. I am sure you are reading on this but havent seen a discussion in the repository. If there has one before, I am sorry for that.
The chronology of my references might be wrong but lets use them as sources of the general concepts. (some preprints got updated)
-
(Soneson et al 2021) https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008585
This publication demonstrated Kallisto/Alevin/STARsolo generated different count tables hence they generate different velocity calculation. The problem is due to low read length makes ambiguous reads very hard to classify. -
(He et al, 2023) https://www.biorxiv.org/content/10.1101/2023.01.04.522742v1.full.pdf and (Hjörleifsson and Sullivan et al, 2024) www.biorxiv.org/content/10.1101/2022.12.02.518832v3.full.pdf utilized flanking k-mers to a rescue a read from ambiguity to and assign to spliced/unspliced.
The reason i am asking is both of the publications always adds STAR into their comparison so I want to hear your opinion about full transcriptome quantification. What are your thoughts on issue? Are you planning to come up with a update in the algorithm on this? Velocyto option in STAR solo is my goto option for spliced/unspliced counts but do you have suggestions to improve accuracy of the quantification with STAR ?
Thank you for maintaining STAR. Its very easy to use which makes it extremely useful. Thank you very much for your time,
Best,
T.
Hello! I am a primary author on one of the listed preprints (and am an avid STARsolo user+fan too despite much of my work being done using pseudoalignment) -- thanks for reading my preprint! :) And thanks to Dr. Dobin for his great work on STAR of course. I am posting here to bookmark this discussion and to potentially engage in it further.