Peaks in halo mass distribution when cutting on stellar mass

Question

Peaks in halo mass distribution when cutting on stellar mass

JulienPeloton opened this issue 5 years ago · comments

Following the second part of the discussion in #55, I found an interesting feature in the halo mass distribution. If we look at the distribution of halo masses in cosmoDC2, all seem OK. But if we look at the same distribution after filtering objects according to the stellar mass, we start to see peaks in the distribution. These peaks are regularly spaced in log, and their position is independent on the stellar mass cut applied:

This plot is generated by following those steps:

Take the cosmoDC2 catalog.
Select only non-synthetic halos (halo_id > 0).
Select only halos for which the central galaxy has a stellar_mass > {1e9, 1e10, 5e10, 1e11} M_o.

The position $p$ of the peaks is independent of the cut on the stellar mass, and it seems to follow:

$ \log(M_{h, p}) = 0.15 * p $

Thanks @rmandelb for suggesting the 0.15! It seems to match empirically, but probably a more careful analysis is needed.

A dedicated notebook can be found here (from #54).

Julien · Answer 1 · Sat Dec 29 2018 06:09:45 GMT+0800 (China Standard Time)

@dkorytov I just saw your reply in the previous thread. Could you just copy/paste it here for moving the discussion in this issue? Thanks!

Dan Korytov · Answer 2 · Sat Dec 29 2018 06:55:17 GMT+0800 (China Standard Time)

Copy and Paste from #55:

I was able to reproduce the effect. The source of it is from how we sample Universe Machine (UM) galaxies into the Outer Rim (OR) halo light cone.

OR is a gravity-only simulation with 3 Gpc/h box length. UM is a model on top Multidark that produces an accurate galaxy population. To populate the OR lightcone with galaxies, we match halos from UM and OR and copy all the galaxies in a UM halo onto a OR lightcone galaxy. The matching is done by binning both halos by mass and randomly assigning an UM halo to an OR halo within the same bin. You can see the bins if you plot the host halo mass vs central stellar mass (2nd figure). I guess with the Mstar>5e10 cut, the effect jumps out more.

Same cuts as in #55

All Central Galaxies

Julien · Answer 3 · Sat Dec 29 2018 06:59:11 GMT+0800 (China Standard Time)

Thanks! I'm glad you could reproduce it. The details of the simulation are beyond my understanding, but could you comment on the choice of the bin sizes? Is it something you tested, and you are confident about? In other words, how the mass bin sizes would affect any other analyses? Thanks!

Salman Habib · Answer 4 · Sat Dec 29 2018 07:44:47 GMT+0800 (China Standard Time)

I assume smoothing this sort of behavior is a pretty straightforward exercise -- right, @dkorytov ?

Dan Korytov · Answer 5 · Sat Dec 29 2018 07:48:30 GMT+0800 (China Standard Time)

Saw Tooth Shape

The saw tooth shapes come from two properties:

Within a mass bin, galaxies are sampled from a fixed population
The shape of the halo mass function

Within each mass bin, all the galaxies come from the same galaxy populations. So any selection we apply to galaxies, the selection should uniformly apply within a mass bin and so the distribution within a mass bin would reflect the halo mass distribution (more low mass halos than higher mass halos).

Binning

As far as the the choice of binning, there is a trade off between narrow mass bins and wider mass bins. Narrower bins give a tighter correlation with halo mass but would sample from a smaller population. In the extremely narrow mass bin case, all halos near a given mass would have same galaxies copied into them. @aphearin could comment more the exact choice of 0.15 in log space.

In terms of science analysis, stellar mass isn't a directly observable but would be derived from the galaxy's light. I would imagine most analyses would apply some sort of selection criteria on either on the observer frame or rest frame magnitudes/colors instead of the stellar mass directly. The distribution of magnitudes and colors is fairly smooth: https://portal.nersc.gov/project/lsst/descqa/v2/?run=2018-11-28_5&catalog=cosmoDC2_v1.1.4_small, https://portal.nersc.gov/project/lsst/descqa/v2/?run=2018-11-28_4&catalog=cosmoDC2_v1.1.4_small, https://portal.nersc.gov/project/lsst/descqa/v2/?run=2018-11-28_1&test=Color_Dist_SDSS. So from my understanding, I don't think the discreteness in central galaxy stellar mass vs host halo mass should be that big of an issue for other analyses.

Dan Korytov · Answer 6 · Sat Dec 29 2018 07:58:06 GMT+0800 (China Standard Time)

@salmanhabib, if we wanted to fix this in the catalog, it would of course require a complete rerun. I don't think algorithmically it would be hard to fix, but it might not be easy to implement depending on the structure of the code. Of course it would require a very close look to make sure there no new bugs, etc.

I don't think it would that hard of an exercise to shuffle the stellar masses into a smooth distribution after the fact. If it really handicaps some analysis, I think we should be able to produce an add-on catalog that gives a smooth central galaxy stellar mass vs host halo mass distribution.

Julien · Answer 7 · Sun Dec 30 2018 03:50:09 GMT+0800 (China Standard Time)

Thanks @dkorytov for your thorough explanations!

In terms of science analysis, stellar mass isn't a directly observable but would be derived from the galaxy's light. I would imagine most analyses would apply some sort of selection criteria on either on the observer frame or rest frame magnitudes/colors instead of the stellar mass directly.

You are right, I could swap the cut on the stellar mass with a cut on the magnitudes/colors instead and see if I get consistent results. I will give a try!

Finally I agree with @aphearin (from thread #55):

Nice work reproducing the effect @dkorytov, and especially to @JulienPeloton for discovering this feature. I guess this will put a limit on the accuracy with which cosmoDC2 observations could constrain the halo mass of stacked galaxy sample. In a future implementation, we could switch to the same bin-free method we use to GalSample Galacticus galaxies (a noisy nearest-neighbor search), doing that for the halo-halo correspondence rather than the bin-based method here.

I think in the context of the HackUrDC2, knowing the details and limitations of the simulations and the potential effects are enough, and any modification (if any!) should be left for future. In practice you are more expert than me to judge whether one should use binning, binning+smoothing, or bin-free methods :-)

Andrew Hearin · Answer 8 · Thu Jan 03 2019 00:28:48 GMT+0800 (China Standard Time)

I just created a new issue LSSTDESC/cosmodc2#84 in the cosmoDC2 repo to track the evolution of the fix for this. It seems to me that the primary science impact of this discreteness effect is a lower bound (~0.1 dex) on the precision with which halo mass can be recovered from synthetic observations made on this mock. Since such applications are not the primary objective of the image simulations, I have tagged this issue with the "Full-sky extragalactic catalog" milestone to indicate that this should be fixed prior to releasing the 5000 deg**2 catalog, but that this doesn't warrant regenerating a catalog for image simulation. Let's migrate discussion of this modeling improvement to that issue. Thanks again to everyone who helped track this down over the holiday.

Julien · Answer 9 · Sat Jan 05 2019 03:52:59 GMT+0800 (China Standard Time)

Thanks @aphearin for keeping track of this issue!

Julien · Answer 10 · Sat Jan 05 2019 06:02:04 GMT+0800 (China Standard Time)

FYI, swapping the cut on the stellar mass with a cut on the apparent magnitude of the central galaxy (lensed, i band: mag_i_lsst), the stripes are still present:

Andrew Hearin · Answer 11 · Sat Jan 05 2019 08:12:46 GMT+0800 (China Standard Time)

Thanks for following up @JulienPeloton - this makes sense because in the cosmoDC2 model, restframe flux derives from stellar mass, which in turn drives from halo mass, so it is expected that this discreteness propagates through to these other variables.

I already have a working prototype for a bin-free method that resolves this issue. This improvement had actually been in my queue for a while, but I bumped up the priority when you pointed out the significance of the discreteness. It should be no problem to resolve all manifestations of this form of discreteness for the full-sky mocks.

Julien · Answer 12 · Sun Jan 06 2019 23:10:15 GMT+0800 (China Standard Time)

Thanks @aphearin for the explanation, that makes totally sense!