Optitype crashed when using BAM from other aligner

Question

Optitype crashed when using BAM from other aligner

MartinPersida opened this issue 6 years ago · comments

Hello,
I tried to run Optitype with a BAM produced with BWA-mem.
I ran into the following error:
File "/usr/local/bin/OptiType/OptiTypePipeline.py", line 358, in
pos, read_details = ht.pysam_to_hdf(bam_paths[0])
File "/usr/local/bin/OptiType/hlatyper.py", line 219, in pysam_to_hdf
read_details[aln.qname] = (aln.get_tag('NM'), aln.query_length) # aln.reference_length it used to be. Soft-trimming is out of question now.
File "pysam/libcalignedsegment.pyx", line 2392, in pysam.libcalignedsegment.AlignedSegment.get_tag
File "pysam/libcalignedsegment.pyx", line 2434, in pysam.libcalignedsegment.AlignedSegment.get_tag
KeyError: "tag 'NM' not present"

I found out that mapped reads which do not have NM most likely are unmapped reads (exact same number when check with samtools stats).
Is there some constrains to filter the BAM before running them with Optitype?
It would be good to have an error handling for this type of error or a usage description for BAM from other aligner.

Regards

Martin

andras86 · Answer 1 · Thu Oct 04 2018 20:08:50 GMT+0800 (China Standard Time)

Hi Martin,
Yes, that must be it. Pipe your bam file through samtools view -F 4 to remove unmapped reads. Let me know how it went. Oh, and can you post your bwa-mem command line call?

rdbremel · Answer 2 · Fri Nov 08 2019 06:15:11 GMT+0800 (China Standard Time)

I have same problem with TCGA bam files from Genome Data Commons repository
Here is their BWQ command lines

Step 2: BWA Alignment - bwa 0.7.15 - samtools 1.3.1
If mean read length is greater than or equal to 70bp:
1 bwa mem
2 -t 8
3 -T 0
4 -R <read_group>
5
6 <fastq_1.fq.gz>
7 <fastq_2.fq.gz> |
8 samtools view
9 -Shb
10 -o <output.bam> -

If mean read length is less than 70bp:

1 bwa aln -t 8 <fastq_1.fq.gz> > <sai_1.sai> &&
2 bwa aln -t 8 <fastq_2.fq.gz> > <sai_2.sai> &&
3 bwa sampe -r <read_group> <sai_1.sai> <sai_2.sai> <fastq_1.fq.gz> <fastq_2.fq.gz> | samtools
view -Shb -o <output.bam> -
If the quality scores are encoded as Illumina 1.3 or 1.5, use BWA aln with the “-l” flag