limix / bgen

A BGEN file format reader.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Magic number?

CHPGenetics opened this issue · comments

Hello there!
I used this library in my code for reading bgen files.
But then I was given an error for magic number mismatching. So I was wondering how was the magic number "1852139362" generated and why my bgen file not recognized. Many thanks!

Hello! Every bgen file has that specific number at the header of it. So it means that the file you are using is not a bgen file. Do you have more info about it?

Thank you Danilo. The input bgen file I used is bgen v1.1, here is the output of qctools for simply reading in this bgen and sample file:


Welcome to qctool
(version: 2.0.1, revision ba5eaa4)

(C) 2009-2017 University of Oxford

Opening genotype files                                      : [******************************] (1/1,0.0s,47.7/s)
========================================================================

Input SAMPLE file(s):           "/XXXXXX/chr22_impute4_full.sample"
Output SAMPLE file:             "(n/a)".
Sample exclusion output file:   "(n/a)".

Input GEN file(s):
                                                    (1110217 snps)  "/XXXXXX/chr22_impute4_full.bgen (bgen v1.1; 463 unnamed samples; zlib compression)"
                                         (total 1110217 snps in 1 sources).
                      Number of samples: 463
Output GEN file(s):             (n/a)
Output SNP position file(s):    (n/a)
Sample filter:                  .
# of samples in input files:    463.
# of samples after filtering:   463 (0 filtered out).

========================================================================

SNPs do not need to be visited -- skipping.
========================================================================

Number of SNPs:
                     -- in input file(s):                 1110217.
 -- in output file(s):                0

Number of samples in input file(s):   463.

========================================================================


Thank you for using qctool.

And I was able to work with this bgen v1.1 file using PLINK1.9, PLINK2, and qctools v2.0.1.
So I am pretty sure the file itself is legit.

Then I tried to convert it to bgen v1.2, and interestingly now I am able to work with your bgen library and my code to read in the file. So I guess the way I programmed to use your bgen library is not buggy, at least not for reading in the bgen file, either.

Ah I see where the problem is: the file I have seems to be a legit bgen file except that it does NOT have a magic number (0x6e656762 (in PLINK2) is 1852139362 (in your library) and is bgen shown below)

The questionable bgen
00000000  14 00 00 00 14 00 00 00  c9 f0 10 00 cf 01 00 00  |................|
00000010  00 00 00 00 05 00 00 00  cf 01 00 00 0f 00 32 32  |..............22|
00000020  3a 31 36 30 35 30 30 37  35 3a 41 3a 47 0f 00 32  |:16050075:A:G..2|

v1.2
00000000  f6 0a 00 00 14 00 00 00  c9 f0 10 00 cf 01 00 00  |................|
00000010  62 67 65 6e 09 00 00 80  e2 0a 00 00 cf 01 00 00  |bgen............|
00000020  04 00 34 30 36 34 04 00  34 35 31 34 04 00 34 31  |..4064..4514..41|

v1.1
00000000  14 00 00 00 14 00 00 00  c9 f0 10 00 cf 01 00 00  |................|
00000010  62 67 65 6e 05 00 00 00  cf 01 00 00 0f 00 32 32  |bgen..........22|
00000020  3a 31 36 30 35 30 30 37  35 3a 41 3a 47 0f 00 32  |:16050075:A:G..2|

The reason qctools and PLINK1.9 and 2 is able to work with that questionable file is because they allow empty magic number. So I made a small change to your library locally and now it works.

  if ( magic_number != 1852139362) {
       fprintf(stderr, "This is not a BGEN file: magic number mismatch.\n");
       return 1;
   }

is now

    if ( magic_number && magic_number != 1852139362) {
        fprintf(stderr, "This is not a BGEN file: magic number mismatch.\n");
        return 1;
    }

Just FYI. Thank you so much!

Thanks a lot for debugging it. The soon to be relese bgen version will include that fix =)