chrchang / plink-ng

A comprehensive update to the PLINK association analysis toolset. Beta testing of the first new version (1.90), focused on speed and memory efficiency improvements, is finishing up. Development is now focused on building out support for multiallelic, phased, and dosage data in PLINK 2.0.

Home Page:https://www.cog-genomics.org/plink/2.0/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible to encode a bed file with no variants?

CreRecombinase opened this issue · comments

I have a quite large dataset that I have manually split into ~1000 bcf files. When I try to select (for example) a subset of the data that is above a given allele frequency I end up with some of those bcf files having 0 variants. This is no problem for the vcf/bcf format, but if I then try to convert the files to plink, I get Error: No variants in .bcf file.

From reading the bim/bed spec, it seems like the spec should permit a file with no variants (i.e an empty bim file, and a bed file with only the three magic bytes)

It looks like --allow-no-vars was retired from plink2 with this commit from 2018. Would you be open to a pull request restoring this functionality?

In principle, yes, but realistically you are better off saving yourself several months of work and instead checking for plink2 error code 13 ("DegenerateData").

I am open to a much-more-manageable pull request that tweaks the VCF/BCF(/other?) import functions so that they return that error code on an empty input file; I'll try to take care of that myself this weekend.

I'm not too familiar with plink/plink2 internals but it was my understanding that sample size is computed from the fam file (or equivalent), and the number of variants is computed from the bim file (or equivalent). If anything, it seems like it would take extra work write code that doesn't work in the case where there are no variants.

I can check for error code 13, but then what do I do?