ratt-ru / dask-ms

Implementation of a dask/xarray dataset backed by a CASA MS

Home Page:https://dask-ms.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to map correlations to Stokes parameters without looping the correlations?

miguelcarcamov opened this issue · comments

  • dask-ms version: 0.2.6
  • Python version: 3.9.7
  • Operating System: Manjaro

Hello, I am trying to calculate the psf/dirtybeam analytically with dask-ms and also gridding the data. However, to calculate the psf analytically for each stokes parameter, or the dirty image for each stokes parameter it has been inevitable to loop the correlations to match the data of the correlations to each Stokes. This additional loop inside the loop of list of subms makes the code considerably slow. Has anyone found a way to map the correlations to each stokes without looping the correlations? Is there a way to do this with dask?

If what I just wrote above does not make sense to you, please ask :).

Hi @miguelcarcamov! Apologies for the delay - I was on vacation. I am not entirely sure what you are trying to accomplish. Could you possibly provide more details/a code snippet?

Are you trying to map [XX, XY, YX, YY] to [I, Q, U V] on the xarray datasets?

Hi @JSKenyon well it depends of the feed really. If you see this code that tries to do a dirty map from data using dask, you can see that in line 120 I loop the correlations in order to map them to I,Q,U,V depending on the feed. In the code, gridded_data and gridded_weights have a shape of (m,n,ncorrs), and to sum them to I,Q,U,V uv-grids depending on the feed it costs me a loop through all correlations for each one of the subms. I want to get rid of that for loop, but I'm not so sure of how to do it yet. It might be difficult to follow this, but let me know if you have questions :)

If you are doing all your operations on dask arrays, I am not sure why the loop itself would be slow (unless you have a huge number of datasets). You can likely simplify the code by just having a mapping stored somewhere so that you don't have to check so many conditions.

If your code is pure dask, that loop over correlations isn't doing any real work - it is just setting up a graph. If, however, your arrays have already been reified to numpy at that point, I can imagine that that is slow.

Would you be willing to run line_profiler on the function? That may make it a bit clearer to me.