brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

show warning when .somalier files have no sites with depth > 0

taytayp opened this issue · comments

I am trying to use somalier to confirm matches between tumor and normal samples from the same patient. somalier extract works fine for both .bam files using sites.hg19.vcf.

The trouble is with relate, which I can't seem to figure out the parameters for. I have tried:

  • somalier relate cohort/*somalier
  • somalier relate -p pedigree.txt cohort/*somalier
  • somalier relate -p pedigree.txt -g group.txt cohort/*somalier

With some simple group and pedigree files, but I only output .tsv files with empty rows.

This feels like it should be a simple use-case, but it is fairly befuddling. Any pointers?


cat group.txt
normal0,tumor0

cat pedigree.txt
fam	normal0	0	0	0	0
fam	tumor0	 0	0	0	0

can you show the stdout and stderr when you run:

somalier relate -o test -g group.txt cohort/*somalier
head test.samples.tsv

Sure thing.

/somalier # somalier relate -o test -g group.txt cohort/*somalier
somalier version: 0.2.6
[somalier] time to read files and get per-sample stats for 2 samples: 0.00
[somalier] time to get expected relatedness from pedigree graph: 0.00
[somalier] time to calculate all vs all relatedness for all 1 combinations: 0.00
[somalier] wrote interactive HTML output for 1 pairs to: test.html
[somalier] wrote groups to: test.groups.tsv
[somalier] wrote samples to: test.samples.tsv
[somalier] wrote pair-wise relatedness metrics to: test.pairs.tsv
/somalier # head test.samples.tsv 
#sample	pedigree_sex	gt_depth_mean	gt_depth_sd	depth_mean	depth_sd	ab_mean	ab_std	n_hom_ref	n_het	n_hom_alt	n_unknown	p_middling_ab	X_depth_mean	X_n	X_hom_ref	X_het	X_hom_alt	Y_depth_mean	Y_n
TP19-09N_N		0.0	-nan	0.0	0.0	0.00	-nan	0	0	0	17384	0.000	0.00	0	0	0	0	0.00	0
TP19-09T_T		0.0	-nan	0.0	0.0	0.00	-nan	0	0	0	17384	0.000	0.00	0	0	0	0	0.00	0

Looks like you don't have any data in the .somalier files
can you re-extract the samples and show the output? perhaps you extracted with GRCh37 sites and your data is in 38? or vice-versa?

Ah, looks like that did it! It's right that the graphs are relatively unimpressive right? There's just one comparison point between two samples, I assume.

#sample_a	sample_b	relatedness	hom_concordance	hets_a	hets_b	shared_hets	hom_alts_a	hom_alts_b	shared_hom_alts	ibs0	ibs2	n	x_ibs0	x_ibs2	expected_relatedness
TP19-09N_N	TP19-09T_T	0.999	1.000	5616	6333	5609	4791	5405	4789	0	14229	14235	0	282	-1.0

Thanks for the help, maybe a good error or warning would be to indicate the *somalier inputs are empty? Might save some users like me a headache.