tests are broken
lgray opened this issue · comments
See recent PRs.
After some investigation, the source of the problem is this PR: scikit-hep/awkward#2757
In that PR, the way the ak.zip()
function calls the ak.unwrap()
changed, and now the option allow_record=False
is included. I am not sure why this changed since this PR is very complicated and large, maybe @agoose77 can say more. But the point is that we need to stop using things like:
softdrop_output = ak.zip(
{
"constituents": ak.Record(
{
"px": [32.2, 32.45],
"py": [64.21, 63.21],
"pz": [543.34, 543.14],
"E": [600.12, 599.56],
}
),
"msoftdrop": 488.2395243115817,
"ptsoftdrop": 142.88274528437645,
"etasoftdrop": 2.726117171791057,
"phisoftdrop": 1.1012644074821902,
"Esoftdrop": 1199.6799999999998,
"pzsoftdrop": 1086.48,
},
)
scikit-hep/awkward#2757 introduced boilerplate to make it easier for us internally to handle safe wrapping and unwrapping array-like and scalar objects. I would consider this a regression in that we should be able to zip together records. It's generally a good idea to type your inputs where possible, but that's a separate concern.
The high-level attrs
addition was a big PR: every ak.Array
now has another attribute that it needs to propagate. (It could be good for fastjet to pass attributes from input particles to output jets, but nothing is breaking by not engaging in this new feature.)
Excluding records in ak.zip
is tangential to that, closing a loophole that was discovered along the way. It does mean that the example you quoted above will now fail, but more broadly, it's not a good idiom. If you want to make an object like the one above, with mostly scalar values and very small lists, don't even use ak.zip
:
softdrop_output = ak.from_iter(
{
"constituents": [
{"px": 32.2, "py": 64.21, "pz": 543.34, "E": 600.12},
{"px": 32.45, "py": 63.21, "pz": 543.14, "E": 599.56},
],
"msoftdrop": 488.2395243115817,
"ptsoftdrop": 142.88274528437645,
"etasoftdrop": 2.726117171791057,
"phisoftdrop": 1.1012644074821902,
"Esoftdrop": 1199.6799999999998,
"pzsoftdrop": 1086.48,
}
)
If, on the other hand, this is not a representative example and your lists of constituents
are much longer than 2, do the zip like this:
softdrop_output = ak.Record(
{
"constituents": ak.zip(
{
"px": [32.2, 32.45, ...], # ellipsis indicate that this is much longer
"py": [64.21, 63.21, ...],
"pz": [543.34, 543.14, ...],
"E": [600.12, 599.56, ...],
}
),
"msoftdrop": 488.2395243115817,
"ptsoftdrop": 142.88274528437645,
"etasoftdrop": 2.726117171791057,
"phisoftdrop": 1.1012644074821902,
"Esoftdrop": 1199.6799999999998,
"pzsoftdrop": 1086.48,
},
)
I would have recommended these constructions before any change to Awkward. What you were using before was working by accident, and it's the kind of loophole that allow_records=False
was intended to prevent.
I just saw your note, @agoose77: you consider it a regression but I don't! I don't think zipping over a record makes sense, since zipping is something that applies to collections. The same applies to the numerical scalars (msoftdrop
, etc.).
- If all fields of the
ak.zip
are collections (of equal length), then that's the normal case, we know what to do. - If some fields of the
ak.zip
are scalars (either records or numbers) and some are collections, thenak.zip
should broadcast the scalars to the collections in order to be consistent, but that's not likely the user's intention. In this case, I thought it was pretty clear that only one overall value ofmsoftdrop
, etc. is desired, not one per constituent. - If all fields are scalars,
ak.zip
doesn't have any clear meaning. If you want a single record to be built from those scalars, use theak.Record
constructor (as in the second example above).
So ak.zip
should at least refuse the all-scalars case and should probably exclude the some-scalars case. (Although I suppose that it's not excluding all-numbers right now.)