show warning when .somalier files have no sites with depth > 0
taytayp opened this issue · comments
I am trying to use somalier to confirm matches between tumor and normal samples from the same patient. somalier extract
works fine for both .bam files using sites.hg19.vcf
.
The trouble is with relate, which I can't seem to figure out the parameters for. I have tried:
somalier relate cohort/*somalier
somalier relate -p pedigree.txt cohort/*somalier
somalier relate -p pedigree.txt -g group.txt cohort/*somalier
With some simple group and pedigree files, but I only output .tsv files with empty rows.
This feels like it should be a simple use-case, but it is fairly befuddling. Any pointers?
cat group.txt
normal0,tumor0
cat pedigree.txt
fam normal0 0 0 0 0
fam tumor0 0 0 0 0
can you show the stdout and stderr when you run:
somalier relate -o test -g group.txt cohort/*somalier
head test.samples.tsv
Sure thing.
/somalier # somalier relate -o test -g group.txt cohort/*somalier
somalier version: 0.2.6
[somalier] time to read files and get per-sample stats for 2 samples: 0.00
[somalier] time to get expected relatedness from pedigree graph: 0.00
[somalier] time to calculate all vs all relatedness for all 1 combinations: 0.00
[somalier] wrote interactive HTML output for 1 pairs to: test.html
[somalier] wrote groups to: test.groups.tsv
[somalier] wrote samples to: test.samples.tsv
[somalier] wrote pair-wise relatedness metrics to: test.pairs.tsv
/somalier # head test.samples.tsv
#sample pedigree_sex gt_depth_mean gt_depth_sd depth_mean depth_sd ab_mean ab_std n_hom_ref n_het n_hom_alt n_unknown p_middling_ab X_depth_mean X_n X_hom_ref X_het X_hom_alt Y_depth_mean Y_n
TP19-09N_N 0.0 -nan 0.0 0.0 0.00 -nan 0 0 0 17384 0.000 0.00 0 0 0 0 0.00 0
TP19-09T_T 0.0 -nan 0.0 0.0 0.00 -nan 0 0 0 17384 0.000 0.00 0 0 0 0 0.00 0
Looks like you don't have any data in the .somalier files
can you re-extract the samples and show the output? perhaps you extracted with GRCh37 sites and your data is in 38? or vice-versa?
Ah, looks like that did it! It's right that the graphs are relatively unimpressive right? There's just one comparison point between two samples, I assume.
#sample_a sample_b relatedness hom_concordance hets_a hets_b shared_hets hom_alts_a hom_alts_b shared_hom_alts ibs0 ibs2 n x_ibs0 x_ibs2 expected_relatedness
TP19-09N_N TP19-09T_T 0.999 1.000 5616 6333 5609 4791 5405 4789 0 14229 14235 0 282 -1.0
Thanks for the help, maybe a good error or warning would be to indicate the *somalier inputs are empty? Might save some users like me a headache.