alexdobin / STAR

RNA-seq aligner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reads spanning multiple genes in bam file

ireferraris opened this issue · comments

Hi,

I have a question related to RNaseq technology 3' UTR samples, specifically Alithea technology, because pools STAR sorted bams have about 40% reads with portions that map to different genes far apart. This phenomenon is reduced on human samples, but these reads amount is very high on plant samples (bean and chickpea).

The command I used is:

STAR --runMode alignReads --outSAMmapqUnique 60 --runThreadN 16 --outSAMunmapped Within --limitBAMsortRAM 400274367879 --soloStrand Forward --quantMode GeneCounts --outBAMsortingThreadN 16 --genomeDir ../new_ref_genome --soloType CB_UMI_Simple --soloCBstart 1 --soloCBlen 14 --soloUMIstart 15 --soloUMIlen 14 --soloUMIdedup NoDedup 1MM_All --soloCellFilter None --soloCBwhitelist barcode.txt --soloBarcodeReadLength 0 --soloFeatures Gene --outSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM --outFilterMultimapNmax 1 --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outFileNamePrefix STAR --readFilesIn R2_001.fastq.gz R1_001.fastq.gz

However, I verified that performing adapter trimming before mapping reduces the number of such "spanning" reads, despite the fact that this step is generally considered unnecessary and by manual it is consigned to skip the adapter trimming step.

Thanks in advance,
Irene

Hi Irene,

adapter trimming never hurts, and if reads have long adapter tails, it helps.