A and B subsample differentiation when reading rv lightcone files with read_asdf()

Question

A and B subsample differentiation when reading rv lightcone files with read_asdf()

boryanah opened this issue 2 years ago · comments

Currently when I read the rv light cone outputs with read_asdf I get no error messages but also can't seem to be able to check whether I am loading A/B or both (might be some options I haven't considered)

Lehman Garrison · Answer 1 · Sat Feb 05 2022 00:36:56 GMT+0800 (China Standard Time)

I had to remind myself, but the LightCone0_rv_Step*.asdf files are indeed all A+B, mentioned here: https://abacussummit.readthedocs.io/en/latest/data-products.html#light-cones

boryanah · Answer 2 · Sat Feb 05 2022 02:01:46 GMT+0800 (China Standard Time)

that makes sense -- I think that's the text I used in the light cone paper as well -- I just wonder if there is a way to make that more explicit to the user and even allow them to load only A or only B, though I realize the latter may not be possible?

Lehman Garrison · Answer 3 · Sat Feb 05 2022 02:17:24 GMT+0800 (China Standard Time)

Maybe something like:

if verbose and header['OutputType'] == 'LightCone':
    print(f'Loading "{fn.name}", which contains the A and B subsamples (10% total)')

We could also populate a new header field like "SubsampleFraction = 0.1".

I don't think we can load A and B separately; I think they're mixed together in the files. It's probably possible to read the PID files and eliminate the B particles, which would save memory, but not IO time. We could open that as a TODO issue if you're finding that your light cone analyses are memory-constrained.

boryanah · Answer 4 · Sat Feb 05 2022 03:02:40 GMT+0800 (China Standard Time)

I see -- I think a message like that is helpful!

BTW, in the documentation, we should change the origin of the observer to -990, -990, -990 rather than -950, -950, -950

Lehman Garrison · Answer 5 · Sat Feb 05 2022 03:09:59 GMT+0800 (China Standard Time)

Great! Please go ahead and fix that in the documentation, and if you'd like to PR the message to the user, that would be great.

A more robust solution would also check header['SimSet'] == 'AbacusSummit' and use ParticleSubsampleA + ParticleSubsampleB instead of 10%.

boryanah · Answer 6 · Wed Feb 09 2022 17:00:35 GMT+0800 (China Standard Time)

Created new PR with the SubsampleFraction fix. My preference would be to always output that message regardless of verbosity (which since it's passed in kwargs might not be too obvious to the end user). Let me know what you think!

Also, I fixed the documentation in the readthedocs for the origin location

boryanah · Answer 7 · Wed Feb 09 2022 17:01:43 GMT+0800 (China Standard Time)

Also as an end user it's not obvious to me that the way to read the header dictionary it to do table.meta of the output from read_abacus()

Lehman Garrison · Answer 8 · Thu Feb 10 2022 05:33:44 GMT+0800 (China Standard Time)

If you want to "unhide" the verbose kwarg by making it a proper parameter that's fine with me! We can even set the default to True if you like. I think we ought to keep a way to toggle it off, though; we don't want it cluttering script output if people are running this as part of a pipeline where they might call this function many times.

And I agree, table.meta is not immediately obvious. It is a standard Astropy Table feature (i.e. we aren't inventing this name), but we could definitely document it better.