brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Comparing Results with Multiple Samples with Same Name

brcopeland opened this issue · comments

I have a WGS pipeline, and when I have a sample with multiple read groups (1 BAM/read group), I want to confirm they all correspond to the same individual prior to merging them. I tried just following the instructions here and found somalier kept overwriting the same file which I realized would be because each BAM is labeled with the same SM tag in the @RG line in the header. I was able to handle this by placing somalier extract output into separate directories and renaming the resulting files. Upon running somalier relate, however, I find all comparisons in, for example, somalier.pairs.tsv reference the same sample name again. If there is a relatedness problem this would make it difficult to infer which BAM(s) was the problem.

I could of course reheader the BAMs to give them distinct SMs but I would prefer to not have to do that just for this step. Do you have any suggestion as to how to accomplish this?

Hi, you can use the --sample-prefix argument to somalier extract for this. Just give each file a unique --sample-prefix and then you'll be able to distinguish them in the output.