Role of aligner in determining HLA type

Question

Role of aligner in determining HLA type

alvinwt opened this issue 8 years ago · comments

Hi,

I have used both razers3 and yara mapper for the aligner for Optitype and I am wondering if it would work with other aligners. Is there a specific setting / sweet spot for mismatches or clipping?

It seems like Optitype is sensitive to spurious alignments, ie reads aligning badly and hence causes a false HLA type after Optitype counts the reads. But on the other hand it can benefit with aligners that can include more reads and more information to discern the HLA types.

Would there be any issues if BWA mem is used instead to align the reads and would the scripts have an issue since it is designed to work with razers3 and yara?

I have seen some differences in the HLA types predicted with different aligners and I think it is a good way to show the robustness of optitype.

andras86 · Answer 1 · Mon Mar 07 2016 19:50:24 GMT+0800 (China Standard Time)

Hi,

97% identity is what we're using, don't go below that. Yara or RazerS don't do clipping so I can't comment on that. I wouldn't clip reads under 50bp though. OT works fine with 50bp reads.

Spurious alignments indeed cause some headache under our model, but we're building OT2 to be more robust with respect to one-off hits.

I'm not sure if BWA-mem's bam output would be plain compatible, but you can always try. Just all-map your reads against the reference fasta in the data folder (each end separately), and feed the bam files to OptiType in place of the fastq files, and see what happens. Let me know too.

Alvin Ng · Answer 2 · Thu Mar 10 2016 10:28:33 GMT+0800 (China Standard Time)

Hi @andras86 ,

Thanks for the reply. I tested 53 random samples in the 1000 genomes dataset and found that for good quality reads the results between using Yara and RazerS are similar, but with the benefit of BWA being able to align more samples. I haven't figure out why certain fastqs just won't align using Yara but I saw a bigger difference in my own data of worst quality.

I ran some older genomes and found that the coverage was good but some samples simply wouldn't align. More samples could be typed with BWA because of the alignment of reads could be done. The results between the runs from the two aligners are mostly concordant but with more reads aligned by BWA. The only thing that seemed to give problems was when disconcordant reads were used and huge blue peaks appeared, giving false positives.

Overall I find that BWA works better especially with noisy datasets with reads that might not align so well, or if there are a lot of unpaired reads. Optitype's still looks very robust after all the tests. I ran the files as you mentioned and it looks very good.

On a side note, how does Optitype choose when there is a case where the intronic reads( inferred from another HLA type) are the majority of reads that make up a HLA type, i.e HLA-A_1101 intronic reads for another HLA type HLA-A_11:04, with the number of the rest of the reads being similar? It is a fine grain point but I tried looking at reads mapped at unique regions between the 2 HLA types and that might be useful for you too.

Thanks for the help!

Sergey Mitrofanov · Answer 3 · Tue Jul 02 2019 16:11:19 GMT+0800 (China Standard Time)

Hello!

I'd like to use bwa mem instead of rasers, because the last one fails to align genome reads. Could you tell me, please, wich options to use?
And is it really possible to pass BAM files to the OptiType without bam2fastq converting?