Reads spanning multiple genes in bam file

Question

Reads spanning multiple genes in bam file

ireferraris opened this issue 4 months ago · comments

Hi,

I have a question related to RNaseq technology 3' UTR samples, specifically Alithea technology, because pools STAR sorted bams have about 40% reads with portions that map to different genes far apart. This phenomenon is reduced on human samples, but these reads amount is very high on plant samples (bean and chickpea).

The command I used is:

STAR --runMode alignReads --outSAMmapqUnique 60 --runThreadN 16 --outSAMunmapped Within --limitBAMsortRAM 400274367879 --soloStrand Forward --quantMode GeneCounts --outBAMsortingThreadN 16 --genomeDir ../new_ref_genome --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 14 --soloUMIstart 15 --soloUMIlen 14 --soloUMIdedup NoDedup 1MM_All --soloCellFilter None --soloCBwhitelist barcode.txt --soloBarcodeReadLength 0 --soloFeatures Gene --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM --outFilterMultimapNmax 1 --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outFileNamePrefix STAR --readFilesIn R2_001.fastq.gz R1_001.fastq.gz

However, I verified that performing adapter trimming before mapping reduces the number of such "spanning" reads, despite the fact that this step is generally considered unnecessary and by manual it is consigned to skip the adapter trimming step.

Thanks in advance,
Irene

Alexander Dobin · Answer 1 · Fri Mar 29 2024 05:10:53 GMT+0800 (China Standard Time)

Hi Irene,

adapter trimming never hurts, and if reads have long adapter tails, it helps.