UCBerkeleySETI / blimpy

Breakthrough Listen I/O Methods for Python

Home Page:https://blimpy.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Gist for mem-mapped filterbank and xarray

telegraphic opened this issue · comments

Mentioned in #196 is memory-mapped filterbank reading and xarray. I think it's still unclear if this is the 'future of blimpy' or another project (I am leaning toward another project), but here's a gist on getting data into xarray DataArray:

import xarray as xr
import dask.array as da
import numpy as np
from blimpy.io import sigproc
import pylab as plt
import os
from astropy import units as u
from astropy.coordinates import SkyCoord

filpath = '/home/dancpr/blimpy/tests/test_data/Voyager1.single_coarse.fine_res.fil'

hdr    = sigproc.read_header(filpath)
hdrlen = sigproc.len_header(filpath)
n_int  = sigproc.calc_n_ints_in_file(filpath)
shape  = (n_int,  hdr[b'nbeams'], hdr[b'nchans'])
data = np.memmap(filename=filpath, dtype='float32', offset=hdrlen, shape=shape)

dask_data = da.from_array(data, chunks=(1, 1, hdr[b'nchans'] // 64), name=os.path.basename(filpath))
xr_data   = xr.DataArray(dask_data, dims=('time', 'pol', 'frequency'))                        

xr_data.attrs['t0'] = hdr[b'tstart'] * u.s
xr_data.attrs['dt'] = hdr[b'tsamp'] * u.s
xr_data.attrs['f0'] = hdr[b'fch1']  * u.MHz
xr_data.attrs['df'] = hdr[b'foff'] * u.MHz
xr_data.attrs['skycoord'] = SkyCoord(hdr[b'src_raj'], hdr[b'src_dej'])
xr_data.attrs['source']   = hdr[b'source_name'].decode('ascii')
xr_data

Example output in jupyter notebook:

image

I believe HDF5 can be used with xarray too:

import h5py
h5path = 'test_data/Voyager1.single_coarse.fine_res.h5'
data = h5py.File(h5path)['data']

dask_data = da.from_array(data, name=os.path.basename(filpath))
xr_data   = xr.DataArray(dask_data, dims=('time', 'pol', 'frequency'))   
xr_data

I'm optimistic that h5pyd support would also be easy to add.

It would be great to use xarray coordinates for frequency and time, but as far as I can tell it needs an array with matching dimensions. blimpy avoids generating frequency data until needed (using np.arange(0, n_chans) * df + f0 is slow for large n_chans and can take up loads of space!)

This needs a side-by-side performance comparison on a few different .fil and .h5 datasets:

  • traditional blimpy
  • something in the style of this gist

Your gist does not compile.....the Bs again! Also, you cannot decode a string, so:
xr_data.attrs['source'] = hdr['source_name']

import xarray as xr
import dask.array as da
import numpy as np
from blimpy.io import sigproc
###import pylab as plt
import os
from astropy import units as u
from astropy.coordinates import SkyCoord

filpath = '/home/dancpr/blimpy/tests/test_data/Voyager1.single_coarse.fine_res.fil'
DIR = "/home/elkins/BASIS/seti_data/voyager/"
filpath = DIR + "Voyager1.single_coarse.fine_res.fil"

hdr    = sigproc.read_header(filpath)
hdrlen = sigproc.len_header(filpath)
n_int  = sigproc.calc_n_ints_in_file(filpath)
shape  = (n_int,  hdr['nbeams'], hdr['nchans'])
data = np.memmap(filename=filpath, dtype='float32', offset=hdrlen, shape=shape)

dask_data = da.from_array(data, chunks=(1, 1, hdr['nchans'] // 64), name=os.path.basename(filpath))
xr_data   = xr.DataArray(dask_data, dims=('time', 'pol', 'frequency'))                        

xr_data.attrs['t0'] = hdr['tstart'] * u.s
xr_data.attrs['dt'] = hdr['tsamp'] * u.s
xr_data.attrs['f0'] = hdr['fch1']  * u.MHz
xr_data.attrs['df'] = hdr['foff'] * u.MHz
xr_data.attrs['skycoord'] = SkyCoord(hdr['src_raj'], hdr['src_dej'])
xr_data.attrs['source']   = hdr['source_name']
print(xr_data)

stdout:

<xarray.DataArray 'Voyager1.single_coarse.fine_res.fil' (time: 16, pol: 1, frequency: 1048576)>
dask.array<Voyager1.single_coarse.fine_res.fil, shape=(16, 1, 1048576), dtype=float32, chunksize=(1, 1, 16384), chunktype=numpy.ndarray>
Dimensions without coordinates: time, pol, frequency
Attributes:
    t0:        57650.78209490741 s
    dt:        18.253611008 s
    f0:        8421.386717353016 MHz
    df:        -2.7939677238464355e-06 MHz
    skycoord:  <SkyCoord (ICRS): (ra, dec) in deg\n    (257.5166, 12.183)>
    source:    Voyager1

Closing this, as some development will progress in hyperseti (e.g. see in UCBerkeleySETI/hyperseti#11) and no immediate plans for a blimpy overhaul