High percentage of reads not mapped, considered "too short"

Question

High percentage of reads not mapped, considered "too short"

thallinger opened this issue 4 months ago · comments

Gerhard Thallinger commented 4 months ago

I am aligning human stranded RNA-seq data (PE 101) from ~30 samples to the chm13v2.0 assembly with the command below
and am experiencing 15-28% of unmapped, "too-short" reads.

STAR --runThreadN 6 --runMode alignReads --genomeDir ${refdir} --outFileNamePrefix ${outputdir} \
     --readFilesIn ${readfiles} --readFilesCommand zcat --outSAMunmapped Within KeepPairs \
     --outFilterMatchNminOverLread 0.0 --outFilterMatchNmin 26 \
     --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts

I applied different values for the --outFilterMatchNminOverLread and --outFilterMatchNmin parameters (0.66/0, 0.40/0, 0.20/0 and finally 0.0/26), but none of the combinations had any influence on the number of "too short" unmapped reads.

Am I using the proper parameters with incorrect values or are there any other parameters, which are related to filtering "too-short" reads?

Execution environment:

more /etc/debian_version
10.1
STAR --version
2.7.11a

Alexander Dobin · Answer 1 · Sat Mar 09 2024 02:30:04 GMT+0800 (China Standard Time)

Hi @thallinger

You need to also set --outFilterMatchNminOverLread 0, since the outFilter parameters are combined with the AND logic.

lxy04 · Answer 2 · Mon Jun 10 2024 22:29:44 GMT+0800 (China Standard Time)

Hi @thallinger

You need to also set --outFilterMatchNminOverLread 0, since the outFilter parameters are combined with the AND logic.

I used to think the parameter is set for filter low quantity map reads, if set to 0, will it result in a lot of low-quality map? I use star for two chip-seq data, but >30% too short reads, however the data used performance were well.