alexdobin / STAR

RNA-seq aligner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Best settings for mapping fusion genes

ahwanpandey opened this issue · comments

Hello,

I have some paired end fusion data from a Qiagen fusion plex panel. The data is paired end. R1 is longer at 230bp and R2 is shorter at about 45bp. I have generated a genome index with --sjdbOverhang 229. I am using the following settings at the moment.

STAR VERSION 2.7.11b

--runMode alignReads \
--runThreadN 16 \
--genomeDir /references/Homo_sapiens/RNAseq/GRCh38.111/STAR/sjdbOverhang229 \
--genomeLoad LoadAndKeep \
--readFilesIn R1.fastq.gz R2.fastq.gz \
--readFilesCommand zcat \
--limitBAMsortRAM 16000000000 \
--outFileNamePrefix Aligned. \
--outReadsUnmapped Fastq \
--outSAMtype BAM Unsorted \
--outSAMmode Full \
--outSAMattributes All \
--outSAMunmapped Within \
--outSAMprimaryFlag AllBestScore \
--outFilterScoreMinOverLread 0.1 \
--outFilterMatchNminOverLread 0.1 \
--alignIntronMax 200000 \
--alignMatesGapMax 200000 \
--alignSJoverhangMin 9 \
--alignSJDBoverhangMin 3 \
--alignSplicedMateMapLmin 12 \
--alignSplicedMateMapLminOverLmate 0 \
--chimSegmentMin 12 \
--chimJunctionOverhangMin 12 \
--chimOutType WithinBAM \
--chimSegmentReadGapMax 0 \
--twopassMode None

Checking IGV, I can see that there are still quite a few reads that don't get split and mapped to it's fusion partner.

Gene1 - the blue highlighted region in the genome track matches with the blue curly bracketed set of some portion of the reads in Gene2
image

Gene2 - the yellow highlighted region in the genome track matches with the yellow curly bracketed set of some portion of the reads in Gene1
image

The distance between the fusion is about 275kb. Are there any settings that would be better to help these reads get properly mapped to the partner?

Thanks so much!
Ahwan

I just saw that a lot of those reads are actually also part of:

overlapping paired end reads.

image
image

Or where the mate is unmapped

image

Using the following settings I could increase the coverage at the fusion sites, but not by much. There are still lots of reads that are still like the above screenshots in the data

--peOverlapNbasesMin 5
--peOverlapMMp 0.01

@alexdobin Could you advise me on how should I optimise the following parameters?

--alignIntronMax 
--alignMatesGapMax 

The distance between two fusion genes that are well know in our cohort is about 275kb. I guess I'm trying to understand how the following parameters actually work and what it means. Sure we know of a fusion event that is 275kb apart, but what about the ones we don't know about how close or how far in the same chromosome? How does this affect them?

Thanks so much.
Ahwan

Hi Ahwan,

These parameters control the maximum intron and maximum distance between mates for normal (non-chimeric) alignments. If you make them larger than the fusion length, the fusion alignment will be reported as normal, otherwise it will be reported as chimeric.