scikit-hep / fastjet

Jet-finding in the Scikit-HEP ecosystem.

Home Page:https://fastjet.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tests are broken

lgray opened this issue · comments

See recent PRs.

After some investigation, the source of the problem is this PR: scikit-hep/awkward#2757
In that PR, the way the ak.zip() function calls the ak.unwrap() changed, and now the option allow_record=False is included. I am not sure why this changed since this PR is very complicated and large, maybe @agoose77 can say more. But the point is that we need to stop using things like:

softdrop_output = ak.zip(
        {
            "constituents": ak.Record(
                {
                    "px": [32.2, 32.45],
                    "py": [64.21, 63.21],
                    "pz": [543.34, 543.14],
                    "E": [600.12, 599.56],
                }
            ),
            "msoftdrop": 488.2395243115817,
            "ptsoftdrop": 142.88274528437645,
            "etasoftdrop": 2.726117171791057,
            "phisoftdrop": 1.1012644074821902,
            "Esoftdrop": 1199.6799999999998,
            "pzsoftdrop": 1086.48,
        },
)

scikit-hep/awkward#2757 introduced boilerplate to make it easier for us internally to handle safe wrapping and unwrapping array-like and scalar objects. I would consider this a regression in that we should be able to zip together records. It's generally a good idea to type your inputs where possible, but that's a separate concern.

scikit-hep/awkward#3023

The high-level attrs addition was a big PR: every ak.Array now has another attribute that it needs to propagate. (It could be good for fastjet to pass attributes from input particles to output jets, but nothing is breaking by not engaging in this new feature.)

Excluding records in ak.zip is tangential to that, closing a loophole that was discovered along the way. It does mean that the example you quoted above will now fail, but more broadly, it's not a good idiom. If you want to make an object like the one above, with mostly scalar values and very small lists, don't even use ak.zip:

softdrop_output = ak.from_iter(
    {
        "constituents": [
            {"px": 32.2, "py": 64.21, "pz": 543.34, "E": 600.12},
            {"px": 32.45, "py": 63.21, "pz": 543.14, "E": 599.56},
        ],
        "msoftdrop": 488.2395243115817,
        "ptsoftdrop": 142.88274528437645,
        "etasoftdrop": 2.726117171791057,
        "phisoftdrop": 1.1012644074821902,
        "Esoftdrop": 1199.6799999999998,
        "pzsoftdrop": 1086.48,
    }
)

If, on the other hand, this is not a representative example and your lists of constituents are much longer than 2, do the zip like this:

softdrop_output = ak.Record(
    {
        "constituents": ak.zip(
            {
                "px": [32.2, 32.45, ...],   # ellipsis indicate that this is much longer
                "py": [64.21, 63.21, ...],
                "pz": [543.34, 543.14, ...],
                "E": [600.12, 599.56, ...],
            }
        ),
        "msoftdrop": 488.2395243115817,
        "ptsoftdrop": 142.88274528437645,
        "etasoftdrop": 2.726117171791057,
        "phisoftdrop": 1.1012644074821902,
        "Esoftdrop": 1199.6799999999998,
        "pzsoftdrop": 1086.48,
    },
)

I would have recommended these constructions before any change to Awkward. What you were using before was working by accident, and it's the kind of loophole that allow_records=False was intended to prevent.

I just saw your note, @agoose77: you consider it a regression but I don't! I don't think zipping over a record makes sense, since zipping is something that applies to collections. The same applies to the numerical scalars (msoftdrop, etc.).

  • If all fields of the ak.zip are collections (of equal length), then that's the normal case, we know what to do.
  • If some fields of the ak.zip are scalars (either records or numbers) and some are collections, then ak.zip should broadcast the scalars to the collections in order to be consistent, but that's not likely the user's intention. In this case, I thought it was pretty clear that only one overall value of msoftdrop, etc. is desired, not one per constituent.
  • If all fields are scalars, ak.zip doesn't have any clear meaning. If you want a single record to be built from those scalars, use the ak.Record constructor (as in the second example above).

So ak.zip should at least refuse the all-scalars case and should probably exclude the some-scalars case. (Although I suppose that it's not excluding all-numbers right now.)