brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failing to run find-sites

johanneskoester opened this issue · comments

I get the following error

$ somalier find-sites
...
[somalier] af not found, using 0
[somalier] af not found, using 0
[somalier] af not found, using 0
[somalier] af not found, using 0
[somalier] af not found, using 0
[somalier] af not found, using 0
fatal.nim(49)            sysFatal
Error: unhandled exception: index out of bounds, the container is empty [IndexDefect]

The test file is too large to upload here, but I am happy to send it via gdrive if you need it.

It contains the AF tag, but not for all records. The error happens after many records have been processed.

Hi @johanneskoester , would you run with the attached debug binary and let me know the error?
somalier_debug.gz
find-sites is less widely used so you're likely hitting something I haven't considered.

Do note that there is a --min-AN argument which defaults to 115000. You may need to lower that if you have a smaller cohort.

/home/brentp/src/somalier/src/somalier.nim(276) somalier
/home/brentp/src/somalier/src/somalier.nim(263) main
/home/brentp/src/somalier/src/somalierpkg/findsites.nim(162) findsites_main
/nim-1.6.6/lib/system/fatal.nim(53) sysFatal
Error: unhandled exception: index out of bounds, the container is empty [IndexDefect]

The --min-AN has no influence, but looking at the help, maybe my input VCF does not satisfy the requirements. It has the AF field, but e.g. no samples. It is the "known variation VCF" from ensembl.org (https://ftp.ensembl.org/pub/release-110/variation/vcf/homo_sapiens/, merged together those individual chromosome files) that I have modified with bcftools annotate in order to rename the MAF field into AF (bcftools annotate -c INFO/AF:=INFO/MAF).

ok. I see the problem, it's a classic :( . I am checking variant.ALT[0] and you have a variant without an alternate allele. I will check for this.

Here is a debug build with a fix for that if you'd like to try it.
somalier_debug.gz

I will also run it on chr1 from your link and assure that it works

I run:

/somalier_debug find-sites --AF-field MAF homo_sapiens-chr1.vcf.gz --min-AN 0

and see:

[somalier] af not found, using 0 # many times!!!
121649 candidate variants
sorted and filtered to 14385 autosomal variants. now dropping INFOs and writing
[somalier] wrote 14385 variants to:sites.vcf.gz

So I think that change should resolve your issue. I'll make a new release and try to reduce the number of times we see that message.

This is out in v0.2.18: https://github.com/brentp/somalier/releases/tag/v0.2.18

thanks for reporting and let me know if you have any more issues.

Thanks a lot!!! Super quick!