Gist for mem-mapped filterbank and xarray
telegraphic opened this issue · comments
Mentioned in #196 is memory-mapped filterbank reading and xarray. I think it's still unclear if this is the 'future of blimpy' or another project (I am leaning toward another project), but here's a gist on getting data into xarray
DataArray
:
import xarray as xr
import dask.array as da
import numpy as np
from blimpy.io import sigproc
import pylab as plt
import os
from astropy import units as u
from astropy.coordinates import SkyCoord
filpath = '/home/dancpr/blimpy/tests/test_data/Voyager1.single_coarse.fine_res.fil'
hdr = sigproc.read_header(filpath)
hdrlen = sigproc.len_header(filpath)
n_int = sigproc.calc_n_ints_in_file(filpath)
shape = (n_int, hdr[b'nbeams'], hdr[b'nchans'])
data = np.memmap(filename=filpath, dtype='float32', offset=hdrlen, shape=shape)
dask_data = da.from_array(data, chunks=(1, 1, hdr[b'nchans'] // 64), name=os.path.basename(filpath))
xr_data = xr.DataArray(dask_data, dims=('time', 'pol', 'frequency'))
xr_data.attrs['t0'] = hdr[b'tstart'] * u.s
xr_data.attrs['dt'] = hdr[b'tsamp'] * u.s
xr_data.attrs['f0'] = hdr[b'fch1'] * u.MHz
xr_data.attrs['df'] = hdr[b'foff'] * u.MHz
xr_data.attrs['skycoord'] = SkyCoord(hdr[b'src_raj'], hdr[b'src_dej'])
xr_data.attrs['source'] = hdr[b'source_name'].decode('ascii')
xr_data
Example output in jupyter notebook:
I believe HDF5 can be used with xarray too:
import h5py
h5path = 'test_data/Voyager1.single_coarse.fine_res.h5'
data = h5py.File(h5path)['data']
dask_data = da.from_array(data, name=os.path.basename(filpath))
xr_data = xr.DataArray(dask_data, dims=('time', 'pol', 'frequency'))
xr_data
I'm optimistic that h5pyd support would also be easy to add.
It would be great to use xarray coordinates for frequency and time, but as far as I can tell it needs an array with matching dimensions. blimpy avoids generating frequency data until needed (using np.arange(0, n_chans) * df + f0
is slow for large n_chans
and can take up loads of space!)
This needs a side-by-side performance comparison on a few different .fil and .h5 datasets:
- traditional blimpy
- something in the style of this gist
Your gist does not compile.....the Bs again! Also, you cannot decode a string, so:
xr_data.attrs['source'] = hdr['source_name']
import xarray as xr
import dask.array as da
import numpy as np
from blimpy.io import sigproc
###import pylab as plt
import os
from astropy import units as u
from astropy.coordinates import SkyCoord
filpath = '/home/dancpr/blimpy/tests/test_data/Voyager1.single_coarse.fine_res.fil'
DIR = "/home/elkins/BASIS/seti_data/voyager/"
filpath = DIR + "Voyager1.single_coarse.fine_res.fil"
hdr = sigproc.read_header(filpath)
hdrlen = sigproc.len_header(filpath)
n_int = sigproc.calc_n_ints_in_file(filpath)
shape = (n_int, hdr['nbeams'], hdr['nchans'])
data = np.memmap(filename=filpath, dtype='float32', offset=hdrlen, shape=shape)
dask_data = da.from_array(data, chunks=(1, 1, hdr['nchans'] // 64), name=os.path.basename(filpath))
xr_data = xr.DataArray(dask_data, dims=('time', 'pol', 'frequency'))
xr_data.attrs['t0'] = hdr['tstart'] * u.s
xr_data.attrs['dt'] = hdr['tsamp'] * u.s
xr_data.attrs['f0'] = hdr['fch1'] * u.MHz
xr_data.attrs['df'] = hdr['foff'] * u.MHz
xr_data.attrs['skycoord'] = SkyCoord(hdr['src_raj'], hdr['src_dej'])
xr_data.attrs['source'] = hdr['source_name']
print(xr_data)
stdout:
<xarray.DataArray 'Voyager1.single_coarse.fine_res.fil' (time: 16, pol: 1, frequency: 1048576)>
dask.array<Voyager1.single_coarse.fine_res.fil, shape=(16, 1, 1048576), dtype=float32, chunksize=(1, 1, 16384), chunktype=numpy.ndarray>
Dimensions without coordinates: time, pol, frequency
Attributes:
t0: 57650.78209490741 s
dt: 18.253611008 s
f0: 8421.386717353016 MHz
df: -2.7939677238464355e-06 MHz
skycoord: <SkyCoord (ICRS): (ra, dec) in deg\n (257.5166, 12.183)>
source: Voyager1
Closing this, as some development will progress in hyperseti (e.g. see in UCBerkeleySETI/hyperseti#11) and no immediate plans for a blimpy overhaul