Optitype crashed when using BAM from other aligner
MartinPersida opened this issue · comments
Hello,
I tried to run Optitype with a BAM produced with BWA-mem.
I ran into the following error:
File "/usr/local/bin/OptiType/OptiTypePipeline.py", line 358, in
pos, read_details = ht.pysam_to_hdf(bam_paths[0])
File "/usr/local/bin/OptiType/hlatyper.py", line 219, in pysam_to_hdf
read_details[aln.qname] = (aln.get_tag('NM'), aln.query_length) # aln.reference_length it used to be. Soft-trimming is out of question now.
File "pysam/libcalignedsegment.pyx", line 2392, in pysam.libcalignedsegment.AlignedSegment.get_tag
File "pysam/libcalignedsegment.pyx", line 2434, in pysam.libcalignedsegment.AlignedSegment.get_tag
KeyError: "tag 'NM' not present"
I found out that mapped reads which do not have NM most likely are unmapped reads (exact same number when check with samtools stats).
Is there some constrains to filter the BAM before running them with Optitype?
It would be good to have an error handling for this type of error or a usage description for BAM from other aligner.
Regards
Martin
Hi Martin,
Yes, that must be it. Pipe your bam file through samtools view -F 4
to remove unmapped reads. Let me know how it went. Oh, and can you post your bwa-mem command line call?
I have same problem with TCGA bam files from Genome Data Commons repository
Here is their BWQ command lines
Step 2: BWA Alignment - bwa 0.7.15 - samtools 1.3.1
If mean read length is greater than or equal to 70bp:
1 bwa mem
2 -t 8
3 -T 0
4 -R <read_group>
5
6 <fastq_1.fq.gz>
7 <fastq_2.fq.gz> |
8 samtools view
9 -Shb
10 -o <output.bam> -
If mean read length is less than 70bp:
1 bwa aln -t 8 <fastq_1.fq.gz> > <sai_1.sai> &&
2 bwa aln -t 8 <fastq_2.fq.gz> > <sai_2.sai> &&
3 bwa sampe -r <read_group> <sai_1.sai> <sai_2.sai> <fastq_1.fq.gz> <fastq_2.fq.gz> | samtools
view -Shb -o <output.bam> -
If the quality scores are encoded as Illumina 1.3 or 1.5, use BWA aln with the “-l” flag