alexdobin / STAR

RNA-seq aligner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CIGAR and query sequence are of different length (samtools after STAR)

vitoriastavis opened this issue · comments

Hello, apologies in advance for the long issue. I'm using STAR and facing this error when using samtools:

$ samtools view -b Aligned.out.sam > out.bam

[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1_sam] Parse error at line 711
samtools view: error reading file "Aligned.out.sam"

Those were the commands for generating the genome and mapping, respectively:

nohup STAR --runMode genomeGenerate --genomeDir . --genomeFastaFiles ../GCF_000001405.40_GRCh38.p14_genomic.fna --sjdbGTFfile ../genomic.gtf --runThreadN 12 > Log.out 2>&1 &

nohup STAR --genomeDir ../star_genome/ --readFilesIn ../hsa_rna.fna --sjdbGTFfile ../genomic.gtf --runThreadN 12 > Log.out 2>&1 &

This is line 711 out of 744 lines of the .sam file:

NM_001286270.2 0 NC_000016.10 4495858 3 170M118N121M6477N216M2524N127M1284N15M * 0 0 ATTGTCCACTAAGGTCTGGCAGGTCTGATTGCCTCTTTTCAGGCACTGAGTGGTGGGGTATGCCATCCTCCCCTGCTGGAACCAGCCTTGGCCTGCCCTGTTAGTCATCAAAAATAGATCTCACCAGGGAACAATCTTCTCAGGTTGTTGTGTAATTTGAGTGAGCCAAGATGGAGTCTCGCTCTGTTGCCCAGGCTGGAGTGCAGTGGATCAGTCTAGCTCATTGCAGCCTCCACCTCCTGGGTTCAAGAGATTCTCCTGCCTCAGCCTCCTGTGTAGCTGGGATTACAGAGTCTTACTTTGTCGGCCAGGCTGGAGTGCAGTGGCATGATCTCGACTCACTGCAACCTCTGTCTCCCAGGCTCAAGAAATCCTCCTACATCAGCCTCCCAAGTAGCTGGGATTACAGGCTGGAGTGCAGTGGCTCCATCTCGGCTCACTGCAACCTCCGCCTCCCAGGTTCAAGCGATTCTCCTGCCTCAGCCTCCTGAGTAGCTGGGATTACAGGACCAGAGGAGCGAGAGCAGCAAGAACCACACCCAGCAGCAATGTCAGCGGAAGTGGAAACCTCAGAGGGGGTAGACGAGTCAGAAAAAAAGAACTCTGGGGCCCTAGAAAAGGAGAACCAAATGAGAATGGCTGACCTCTCG * NH:i:2 HI:i:1 AS:i:654 nM:i:0

If I delete this line, the error just points to the next line.

Then I've seen this issue

I tried reducing the threads to 8 with STAR from Linux_x86_64_static. The only difference is that the error points to line 710.

In this case, this is line 710/745:

NR_026717.1 0 NC_000006.12 31971175 0 100M594N549M * 00 GTCTGACACAAGCATTAGTGAGATGCTCCCCTCGAAGAATAGTCTTGTTTCTTCTAAGGACTGATTCTCACCCCGGCTTTGGCTCTCCTAATTTTAGAGGGTCCTCCAAATGCAGTGAGGTTAGGAAGGACGTCTGCGCTCAGATCAAGAATCCAGTTACCTCAAAGCTCCCCAACTTCCACCTCCGCAGAGCTATGACGTCATGGCAGGCACGCCAGAGGCCGAAGGATGCAAAAGTGGTTTTCTGCTTTCGATGATGCAATCATTCAGCGACAGTGGCGGGCAAACCCCTCCCGGGGCGGGGGAGGTGTGAGCTTCACGAAGGAGGTTGACACCAACGTGGCCACCGGCGCCCCTCCACGCCGCCAACGAGTCCCCGGGCGTGCGTGCCCTTGGAGGGAGCCAATCCGCGGCCGGCGTGGGGCCCGGCCTGGCGGAGGTGATGCTGGTATGTGCGTCGCCACCGCCCCTCCCAGCACTGACGGGCCTGAGGGACGACAAGTTGACGCTCCTTTCGTCATCACCTGGTCTAGGAGGGACGCCCGGGGAGACCGTACGTCACTGCTCTGCGCCGGAAGACCCTATTTTCAGGTTCTCTTCCCTCCATTCCTACCCCTTCCCCGGTACCATAAAATCCCGGGATATGAGCT * NH:i:6 HI:i:1 AS:i:648 nM:i:0

When running STAR from Linux_x86_64, I got this error: version `GLIBC_2.29' not found

You mentioned 'Cut a few thousands reads around the problematic read and run mapping.' I'm not sure how to do that.

I'm mapping the Homo sapiens transcripts to the RefSeq genome and annotations. All of them can be found here

I'm using:

Ubuntu 18.04.3 LTS
STAR 2.7.11b
samtools 1.20

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 12
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
CPU family: 6
Model name: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz

39G RAM, 8G swap

I appreciate any insight!