brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bam files without `@RG` tag

mschubert opened this issue · comments

Thanks a lot for the tool, it's really easy to use, and after the first try already found a sample swap in our setup!

I'm now trying to confirm sample matches between WES and RNA-seq data from an external sequencing provider. I ran into an issue with the RNA-seq bam files provided, where I get the following output from somalier:

Error: unhandled exception: [somalier] no read-group in bam file [ValueError]

The @RG field is indeed missing in these bam file headers.

Is there a way to manually supply the sample ID (e.g. via a command-line argument)?

Hi Michael, always nice to hear that a tool is useful!

I added a way to do this via env variables. Will you give this binary a try (gunzip, chmod +x) and run as:

SOMALIER_SAMPLE_NAME=my_sample somalier_dev extract ...

where my_sample is the name you want to use?

somalier_dev.gz

I'll get a release out soon.

Wow, that was quick, thank you! 🎉

I can confirm that the binary works as expected and solves my issue

Just a note that I ran into the same issue and this worked wonderfully. I did see there is a --sample-prefix option, was this also meant for adding the sample name?

Glad to hear it works.
--sample-prefix is for when you have multiple samples with the same ID, for example if the same sample had RNA-Seq and DNA-Seq. Then the user can specify a sample-prefix so that they (and the hashtable in somalier) can differentiate.
Release for this change is on my TODO.