brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SIGSEGV error for somalier ancestry

nswh opened this issue · comments

commented

I have the following error either from docker and the static build

somalier ancestry --labels ancestry-labels-1kg.tsv 1kg-somalier/*.somalier ++ query/*.somalier
somalier version: 0.2.13
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

query/*.somalier is successfully generated from VCF by the following command

somalier extract -d query/ --sites sites.hg38.vcf.gz -f GRCh38Decoy_genome.fa query.vcf.gz
somalier version: 0.2.13
[somalier] found 17608 sites

A small proportion of the VCF is here
query.vcf.gz

can you share, for example, 1 or 2 of your somalier files?

This actually works for me with the following output. Do you see the issue with the vcf you sent? It could be that your machine does not support the CPU instructions needed but usually that would give an unknown instruction sig.

$ somalier extract --sites ~/src/somalier/sites.hg38.vcf.gz -f /data/human/Homo_sapiens_assembly38.fasta query.vcf.gz
somalier version: 0.2.13
[somalier] found 138 sites

$ somalier ancestry --labels scripts/ancestry-labels-1kg.tsv 1kg-somalier/*.somalier ++ ~/Downloads/NA12878_566_20210429_A00712.somalier 
somalier version: 0.2.13
[somalier] subset from 17384 to 123 high call-rate sites (removed 99.29%)
[somalier] time for dimensionality reduction to shape [2496, 5]: 0.25 seconds
[somalier] Epoch:0. loss: 1.10086. accuracy on unseen data: 0.594.  total-time: 0.00
[somalier] Epoch:500. loss: 0.48224. accuracy on unseen data: 0.822.  total-time: 1.82
[somalier] Epoch:1000. loss: 0.52429. accuracy on unseen data: 0.772.  total-time: 3.68
[somalier] Epoch:1500. loss: 0.47193. accuracy on unseen data: 0.832.  total-time: 5.56
[somalier] Epoch:2000. loss: 0.39615. accuracy on unseen data: 0.842.  total-time: 7.89
[somalier] Epoch:2500. loss: 0.55570. accuracy on unseen data: 0.792.  total-time: 10.05
[somalier] Epoch:3000. loss: 0.47538. accuracy on unseen data: 0.802.  total-time: 12.01
[somalier] Epoch:3500. loss: 0.45755. accuracy on unseen data: 0.832.  total-time: 14.17
[somalier] Epoch:4000. loss: 0.54101. accuracy on unseen data: 0.802.  total-time: 16.69
[somalier] Epoch:4500. loss: 0.42020. accuracy on unseen data: 0.822.  total-time: 19.09
[somalier] Epoch:5000. loss: 0.58802. accuracy on unseen data: 0.792.  total-time: 21.47
[somalier] Epoch:5500. loss: 0.48210. accuracy on unseen data: 0.812.  total-time: 23.76
[somalier] Epoch:6000. loss: 0.51912. accuracy on unseen data: 0.792.  total-time: 26.20
[somalier] Epoch:6500. loss: 0.46640. accuracy on unseen data: 0.832.  total-time: 29.54
[somalier] Epoch:7000. loss: 0.45954. accuracy on unseen data: 0.822.  total-time: 31.81
[somalier] Epoch:7500. loss: 0.46691. accuracy on unseen data: 0.772.  total-time: 34.08
[somalier] Epoch:8000. loss: 0.47362. accuracy on unseen data: 0.822.  total-time: 36.42
[somalier] Epoch:8500. loss: 0.45352. accuracy on unseen data: 0.822.  total-time: 38.97
 [somalier] Epoch:9000. loss: 0.49716. accuracy on unseen data: 0.782.  total-time: 41.30
[somalier] Epoch:9500. loss: 0.50170. accuracy on unseen data: 0.812.  total-time: 43.58
[somalier] Epoch:10000. loss: 0.52464. accuracy on unseen data: 0.772.  total-time: 45.74
[somalier] reduced query set to: [1, 5]
[somalier] wrote text file to somalier-ancestry.somalier-ancestry.tsv
[somalier] wrote html file to somalier-ancestry.somalier-ancestry.html
commented

Same error unfortunately. The machine is Amazon ubuntu instance. CPU model is Intel(R) Xeon(R) Platinum. Would it be an issue? The storage is Amazon EBS volume attached to the instance.

somalier ancestry --labels ancestry-labels-1kg.tsv 1kg-somalier/*.somalier ++ NA12878_566_20210429_A00712.somalier
somalier version: 0.2.13
SIGSEGV: Illegal storage access. (Attempt to read from nil?)

Attached is the full somalier file I generated from full VCF file.
NA12878_566_20210429_A00712.somalier.zip

I also attached full VCF file here.
NA12878_566_20210429_A00712.hard-filtered.vcf.gz

can you show the output of head ancestry-labels-1kg.tsv
then also, here is a debug build of somalier. it's unchanged from release, but should give more info about where it's crashing. can you give it a try and let me know the result?
somalier_debug.gz

commented

Hmm, it is the ancestry-labels-1kg.tsv problem. Now it works. Thanks.