FRBs / sigpyproc3

Python3 version of Ewan Barr's sigpyproc library

Home Page:https://sigpyproc3.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Header backend default incompatible with machine_ids

kmjc opened this issue · comments

A friend of mine ran into a Header issue:

The backend default

backend: str = "Fake"

is incompatible with the machine_ids dict

(I'd have made a PR but I was unsure whether it's better to change the default or the dictionary key)

Hi,

I tried a fix which @kmjc suggested me. That is putting backend='CHIME' in the header dict but it still doesn't work. Here is my code:

header = Header(filename="/home/sujay/local/data/chime/test.fil",
    data_type="filterbank",
    nsamples=1024,
    nchans=1024,
    fch1=800,
    foff=-0.390625,
    nbeams=1,
    ibeam=122,
    nifs=1,
    tsamp=0.001,
    tstart=59599.83921,
    telescope="CHIME",
    backend="CHIME",
    nbits=32,
    source="Fake")

#filfile = open("/home/sujay/local/data/chime/test.fil", "w")
fil_file_writer = header.prep_outfile(filename="/home/sujay/local/data/chime/test.fil")

#for i in range(10):
data = np.random.normal(200, 10, size=(1024, 1024))
fil_file_writer.write(data)
fil_file_writer.close()

I get the following error when I read back the test.fil :

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-dfc296d5361d> in <module>
----> 1 FilReader("/home/sujay/local/data/chime/test.fil")

~/.virtualenvs/chime/lib/python3.8/site-packages/sigpyproc/readers.py in __init__(self, filenames, check_contiguity)
     38             filenames = [filenames]
     39         self._filenames = filenames
---> 40         self._header = Header.from_sigproc(
     41             self._filenames, check_contiguity=check_contiguity
     42         )

~/.virtualenvs/chime/lib/python3.8/site-packages/sigpyproc/header.py in from_sigproc(cls, filenames, check_contiguity)
    444 
    445         """
--> 446         header = sigproc.parse_header_multi(filenames, check_contiguity=check_contiguity)
    447         frame = "pulsarcentric" if header.get("pulsarcentric") else "topocentric"
    448         frame = "barycentric" if header.get("barycentric") else "topocentric"

~/.virtualenvs/chime/lib/python3.8/site-packages/sigpyproc/io/sigproc.py in parse_header_multi(filenames, check_contiguity)
    126         filenames = [filenames]
    127 
--> 128     header = parse_header(filenames[0])
    129     # Set multifile header values
    130     header["hdrlens"] = [header["hdrlen"]]

~/.virtualenvs/chime/lib/python3.8/site-packages/sigpyproc/io/sigproc.py in parse_header(filename)
    183                 break
    184 
--> 185             key_fmt = header_keys[key]
    186             if key_fmt == "str":
    187                 header[key] = _read_string(fp)

KeyError: '\n\x00\x00\x00machine_id\x14\x00\x00\x00\r\x00'

Note that even PRESTO's readfile is not able to read the file. It throws up ERROR: read_filterbank_header - unknown parameter: ERROR

So I tried something simple to check if there is the writing is self consistent. I took the tutorial.fil file from the repo, read a small chunk from it, dumped that chunk using the to_file method and tried to read back the smaller file and it still gives the KeyError. Here is the code and the error message:

filfile = FilReader("/home/sujay/local/data/chime/tutorial.fil")
data = filfile.read_block(0, 1024)
data.to_file("/home/sujay/local/data/chime/test_tutorial.fil")
filfile1 = FilReader("/home/sujay/local/data/chime/test_tutorial.fil")

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-d353997bff09> in <module>
      2 data = filfile.read_block(0, 1024)
      3 data.to_file("/home/sujay/local/data/chime/test_tutorial.fil")
----> 4 filfile1 = FilReader("/home/sujay/local/data/chime/test_tutorial.fil")

~/.virtualenvs/chime/lib/python3.8/site-packages/sigpyproc/readers.py in __init__(self, filenames, check_contiguity)
     38             filenames = [filenames]
     39         self._filenames = filenames
---> 40         self._header = Header.from_sigproc(
     41             self._filenames, check_contiguity=check_contiguity
     42         )

~/.virtualenvs/chime/lib/python3.8/site-packages/sigpyproc/header.py in from_sigproc(cls, filenames, check_contiguity)
    444 
    445         """
--> 446         header = sigproc.parse_header_multi(filenames, check_contiguity=check_contiguity)
    447         frame = "pulsarcentric" if header.get("pulsarcentric") else "topocentric"
    448         frame = "barycentric" if header.get("barycentric") else "topocentric"

~/.virtualenvs/chime/lib/python3.8/site-packages/sigpyproc/io/sigproc.py in parse_header_multi(filenames, check_contiguity)
    126         filenames = [filenames]
    127 
--> 128     header = parse_header(filenames[0])
    129     # Set multifile header values
    130     header["hdrlens"] = [header["hdrlen"]]

~/.virtualenvs/chime/lib/python3.8/site-packages/sigpyproc/io/sigproc.py in parse_header(filename)
    183                 break
    184 
--> 185             key_fmt = header_keys[key]
    186             if key_fmt == "str":
    187                 header[key] = _read_string(fp)

KeyError: ' '

Please let me know what the issue is. I will be happy to help in coding up the fix :)

Hi,

Thanks for reporting this bug. It bypassed all my tests!
The issue was with the default value of a keyword rawdatafile in the header.
rawdatafile: str | None = None
It should not have been None, otherwise parsing errors while reading the file. I have fixed this error
and the backend typo ("Fake" -> "FAKE") in the latest commit 0327f2e.

Hi, thanks for fixing it so fast. It now works within python but PRESTO's readfile is still throwing an error while reading the header. Any idea what is causing this?

sujay@sujay-Latitude-5420:~/local/data/chime$ readfile test.fil 
Assuming the data is a SIGPROC filterbank file.

ERROR: read_filterbank_header - unknown parameter: refdm

Okay I took a look at the source. I think the issue is that while writing the file the header key refdm instead of dm on this line ->

"refdm": self.dm,
.

Should correct it and make a PR ?

Actually, refdm is the valid sigproc keyword.

The PRESTO's readfile (function read_filterbank_header) is not using either the keyword dm or refdm and thus causing the error while parsing.

The earlier versions of sigpyproc header reading/writing used to be very dynamic. I intentionally made it strict and to use all the sigproc defined keywords (with some default values) when writing to a file.

Maybe we can request PRESTO to add these keywords!

If you are only looking for a tool to print the header information, you can use the spp_header utility of sigpyproc or the original header from sigproc.

I see. Because when I corrected the refdm to dm in my local copy of the code, readfile is able to read the file. We want to use sigpyproc to dump filterbank files and then want PRESTO to process them. So if PRESTO can't read them them it will be an issue. So not sure how to go about this.

Another issue I just found out is that the number of samples read by PRESTO are nsampels - 1 (see output of readfile below). This is happening only when the file is dumped by sigpyproc.

Code:

header = Header(filename="/home/sujay/local/data/chime/test.fil",
    data_type="filterbank",
    nsamples=1024,
    nchans=1024,
    fch1=800,
    foff=-0.390625,
    nbeams=1,
    ibeam=122,
    nifs=1,
    tsamp=0.001,
    tstart=59599.83921,
    telescope="CHIME",
    backend="CHIME",
    nbits=32,
    source="Fake")

#filfile = open("/home/sujay/local/data/chime/test.fil", "w")
fil_file_writer = header.prep_outfile(filename="/home/sujay/local/data/chime/test.fil")

data = np.random.normal(200, 10, size=(1024, 1024))
fil_file_writer.cwrite(data.astype(np.float32))
fil_file_writer.close()

readfile output (spectra per file are 1023 instead of 1024) :

sujay@sujay-Latitude-5420:~/local/data/chime$ readfile test.fil 
Assuming the data is a SIGPROC filterbank file.


1: From the SIGPROC filterbank file 'test.fil':
                  Telescope = CHIME
                Source Name = Fake
            Obs Date String = 2022-01-20T20:08:27.744
             MJD start time = 59599.83920999999827
                   RA J2000 = 00:00:00.0000
             RA J2000 (deg) = 0                
                  Dec J2000 = 00:00:00.0000
            Dec J2000 (deg) = 0                
                  Tracking? = True
              Azimuth (deg) = 0
           Zenith Ang (deg) = 0
            Number of polns = 2 (summed)
           Sample time (us) = 1000             
         Central freq (MHz) = 600.1953125      
          Low channel (MHz) = 400.390625       
         High channel (MHz) = 800              
        Channel width (MHz) = 0.390625         
         Number of channels = 1024
      Total Bandwidth (MHz) = 400              
                       Beam = 122 of 1
            Beam FWHM (deg) = 1.717
         Spectra per subint = 2400
           Spectra per file = 1023
      Time per subint (sec) = 2.4
        Time per file (sec) = 1.023
            bits per sample = 32
          bytes per spectra = 4096
        samples per spectra = 1024
           bytes per subint = 9830400
         samples per subint = 2457600
                zero offset = 0                
           Invert the band? = False
       bytes in file header = 392

Changing refdm to dm

It works because, before writing to file, the function sigproc.encode_header()checks for undefined keywords. So, dm is removed from the list. And readfile is able to read.

Spectra per file

This is probably happening because of the keyword signed. Spectra per file is calculated as
(filelen - headerlen) / (nchans *nbits / 8). There is an offset in the headerlen when read in PRESTO.
In the readfile output, it says the headerlen is 392 bytes, whereas it should be 389 as per sigpyproc. Thus the offset of 1 sample in the nsamples calculation.

I think this might be a bug in PRESTO parsing (char vs int) code link

        } else if (strings_equal(string, "signed")) {
            char tmp;
            chkfread(&tmp, sizeof(char), 1, inputfile);
            fb->signedints = tmp;
            totalbytes += sizeof(int);

Solution

A quick fix would be to simply not write problematic keywords like refdm and signed to filterbanks.
Anyway, I will try to raise these issues with PRESTO and first attempt for a PR there.

Also tagging @telegraphic for suggestions.

Okay thanks for the explanation. About your solution, I am not defining the refdm and signed keywords. They are defined internally I think. How can I force the code to not write them ?

yeah, the quick fix would require changing the internals (removing signed and refdm from Header).

For a long-term solution and compatibility with PRESTO, I have raised the issue and opened a PR scottransom/presto#164.

Also, if you are using cwrite to write 2D data to filterbanks, convert data to a 1D array first in Frequency-major order.
So if data is (nchans, nsamples), then file.cwrite(data.transpose().ravel()).
Another option would be

from sigpyproc.block import FilterbankBlock
block = FilterbankBlock(data, header)
block.to_file(filename)