abacusorg / abacusutils

Python code to interface with halo catalogs and other Abacus N-body data products

Home Page:https://abacusutils.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A and B subsample differentiation when reading rv lightcone files with read_asdf()

boryanah opened this issue · comments

Currently when I read the rv light cone outputs with read_asdf I get no error messages but also can't seem to be able to check whether I am loading A/B or both (might be some options I haven't considered)

I had to remind myself, but the LightCone0_rv_Step*.asdf files are indeed all A+B, mentioned here: https://abacussummit.readthedocs.io/en/latest/data-products.html#light-cones

that makes sense -- I think that's the text I used in the light cone paper as well -- I just wonder if there is a way to make that more explicit to the user and even allow them to load only A or only B, though I realize the latter may not be possible?

Maybe something like:

if verbose and header['OutputType'] == 'LightCone':
    print(f'Loading "{fn.name}", which contains the A and B subsamples (10% total)')

We could also populate a new header field like "SubsampleFraction = 0.1".

I don't think we can load A and B separately; I think they're mixed together in the files. It's probably possible to read the PID files and eliminate the B particles, which would save memory, but not IO time. We could open that as a TODO issue if you're finding that your light cone analyses are memory-constrained.

I see -- I think a message like that is helpful!

BTW, in the documentation, we should change the origin of the observer to -990, -990, -990 rather than -950, -950, -950

Great! Please go ahead and fix that in the documentation, and if you'd like to PR the message to the user, that would be great.

A more robust solution would also check header['SimSet'] == 'AbacusSummit' and use ParticleSubsampleA + ParticleSubsampleB instead of 10%.

Created new PR with the SubsampleFraction fix. My preference would be to always output that message regardless of verbosity (which since it's passed in kwargs might not be too obvious to the end user). Let me know what you think!

Also, I fixed the documentation in the readthedocs for the origin location

Also as an end user it's not obvious to me that the way to read the header dictionary it to do table.meta of the output from read_abacus()

If you want to "unhide" the verbose kwarg by making it a proper parameter that's fine with me! We can even set the default to True if you like. I think we ought to keep a way to toggle it off, though; we don't want it cluttering script output if people are running this as part of a pipeline where they might call this function many times.

And I agree, table.meta is not immediately obvious. It is a standard Astropy Table feature (i.e. we aren't inventing this name), but we could definitely document it better.