brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Read group (RG:Z:) position in tags seems to matter and will not run without it being at the end

JustinChu opened this issue · comments

When I run somalier on this dataset I get a readgroup error.

somalier extract -d extracted/ --sites sites.hg38.vcf.gz -f hs38.fa HG00733_hic.bam
somalier version: 0.2.15
somalier.nim(28)         get_sample_name
Error: unhandled exception: [somalier] no read-group in bam file [ValueError]
samtools view HG00733_hic.bam | head -1
NB551675:7:HHLMHBGX9:1:22205:10514:9656	16	chr1	9998	39	67S84M	*	0	0	CCCCCCCCCCCCCCCCCCCCCCCCCCCCACCCCCCACCCCCCACCCTAACCCTATCTCTAATCTTTACGATAACCCTAACCCTAACCCTAACACTAACC
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA	EEEE6/AEEE///E<</////6///A/////</////////////////////E////A///////////E/////////////A////A////E/////A////E//A//EE/A/EEEE/E/EAEEEAEEEEEEEEEEEEEEEEEA
AAAA	NM:i:1	MD:Z:25C58	AS:i:79	XS:i:64	RG:Z:HG00733_hic	SA:Z:chr2,32916254,+,109S42M,0,2;	XA:Z:chr20,+64287312,50M2D32M69S,4;

However it seems to work when I move the readgroup tag over to the end.

somalier extract -d extracted/ --sites  sites.hg38.vcf.gz -f hs38.fa HG00733_hic_moveRG.bam
samtools view HG00733_hic_moveRG.bam | head -1
NB551675:7:HHLMHBGX9:1:22205:10514:9656	16	chr1	9998	39	67S84M	*	0	0	CCCCCCCCCCCCCCCCCCCCCCCCCCCCACCCCCCACCCCCCACCCTAACCCTATCTCTAATCTTTACGATAACCCTAACCCTAACCCTAACACTAACC
CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA	EEEE6/AEEE///E<</////6///A/////</////////////////////E////A///////////E/////////////A////A////E/////A////E//A//EE/A/EEEE/E/EAEEEAEEEEEEEEEEEEEEEEEA
AAAA	NM:i:1	MD:Z:25C58	AS:i:79	XS:i:64	SA:Z:chr2,32916254,+,109S42M,0,2;	XA:Z:chr20,+64287312,50M2D32M69S,4;	RG:Z:HG00733_hic

As per BAM/SAM file specification tag position should not matter. It is quite tedious to alter and reindex all of the alignments so if you have any easy fix suggestions that would be great.

can you show the SAM header (with grep RG) of both files?

samtools view -H HG00733_hic.bam | grep @RG | grep -v @PG
@RG	ID:HG00733_hic
samtools view -H HG00733_hic_RG.bam | grep @RG | grep -v @PG
@RG	ID:HG00733_hic	SM:HG00733_hic

I'll try swapping the headers to see if that does anything due to the additional SM entry

Yes, the SM tag is required to get the sample name.