alexdobin / STAR

RNA-seq aligner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Counting intron reads in bulk RNA-seq

jcm6t opened this issue · comments

We have an RNA-seq data set generated using illumina Truseq stranded total RNA depletion and not surprisingly, the data contains a large proportion of reads that map to introns (50-60%) using a recent standard gencode. Without getting into a debate of the merits of the assay, I would really like to use the intronic data somehow.

There are suggestions that counting intron reads might be useful although would be measuring a different facet of transcription:
https://academic.oup.com/nargab/article/2/3/lqaa073/5910008

I think a comparison of exon vs exon + intron could be useful and would not just throw reads out.

We could go to bwa, kallisto, or salmon, but since we used STAR for spliced bulk, I feel that a STAR vs STAR comparison might be better although maybe the mapping is so different that it doesn't matter.

Is there a principled way to count intron + exon reads to give total transcription gene counts ? We can create a simplified GTF with gene boundaries, but I am worried that STAR (bulk) algorithm fundamentally assumes a splicing model tuned to ligate out large chunks of sequence and negating that will be difficult.

Suggestions ? eg don't bother with STAR . Tuning tweaks ?

Read counting also possibly becomes an issue but haven't thought about that or RSEM yet.
Thanks, Joe.

Hi Joe,

If you want to perform simple counting of all reads within gene boundaries, you can do it by supplying STAR with a modified GTF. You can add transcripts with gene start/end positions for each gene to the standard GTF. This will count intronic reads while also keeping the splice junctions that are used for mapping.

Alex,
Thanks for confirming this.

Thanks also for continuing to support STAR and for your willingness to share knowledge and ideas to improve transcriptome practices generally.

closing this now.