Sometimes inconsistent values in hist conversion from TProfile read via uproot
raymondEhlers opened this issue · comments
When I convert TProfile
objects to hist that I've read from a ROOT file with uproot, I sometimes observe inconsistent values. For some input files, they're consistent, but for others, they're inconsistent - It depends on the input file. Unfortunately, I haven't figured out how to reproduce this in a simple way, so I've just attached two files which are produced in the same way and exhibit this behavior: (failing.root.txt (11MB), passing.root.txt (< 1 MB)). Both appear to be valid output files beyond this issue. The code below illustrates the behavior:
def test_uproot_profile_consistency_with_hist() -> None:
import hist
import numpy as np
import uproot
input_file = Path("failing.root")
# Open and extract with uproot
with uproot.open(input_file) as f:
# Cast to a base class. Actual class is here: https://github.com/alisw/AliPhysics/blob/master/PWG/EMCAL/EMCALbase/AliEmcalList.h
output_list = f["AliAnalysisTaskTrackSkim_pythia"].bases[0]
x_sec_uproot = next(h for h in output_list if h.name == "fHistXsection")
x_sec_uproot_hist = x_sec_uproot.to_hist()
# Cross check with ROOT
ROOT = pytest.importorskip("ROOT")
with ROOT.TFile.Open(str(input_file), "READ") as f_ROOT:
output_list = f_ROOT.Get("AliAnalysisTaskTrackSkim_pythia")
# Cast to a base class. Actual class is here: https://github.com/alisw/AliPhysics/blob/master/PWG/EMCAL/EMCALbase/AliEmcalList.h
output_list = ROOT.bind_object(ROOT.addressof(output_list), "TList")
x_sec_temp = output_list.FindObject("fHistXsection")
x_sec_temp.SetDirectory(0)
x_sec_ROOT_values = np.array([x_sec_temp.GetBinContent(i) for i in range(1, x_sec_temp.GetNbinsX() + 1)], dtype=np.float64)
print(f"Uproot: {uproot.__version__}, hist: {hist.__version__}")
# The standard uproot values before conversion appear correct
np.testing.assert_allclose(x_sec_uproot.values(), x_sec_ROOT_values)
# Fails here for `failing.root`, but fine for `passing.root`
np.testing.assert_allclose(x_sec_uproot.values(), x_sec_uproot_hist.values())
It strikes as odd that it depends on the particular file when they were generated the same way (with different inputs). I wonder if it's due to some issue with accessing the base class/the cast that I do. It's required because the profile is stored in a TList derived class that's part of our experimental software stack (note that I don't care about any of the additional info stored in the derived class. It's available here if helpful). I'm concerned this is perhaps corrupting an expected memory layout or otherwise causing an issue. Since this class is unfortunately driven by an experiment software stack constraint, it's difficult for me to avoid. If there's a better way for me to workaround this, I'd be happy to use that instead.
Bottom line: I suspect there might be an uproot issue here, but I'm concerned that it's instead due to an experimental software stack quirk. Help here is greatly appreciated - thanks!
>>> import uproot, hist
... print(f"Uproot: {uproot.__version__}, hist: {hist.__version__}")
Uproot: 5.3.1, hist: 2.7.2
You were involved in #908, so you know about #1000 (and #1148 is someone else running into it): I'm not convinced that ROOT's TProfile
and boost-histogram's WeightedMean
Storage are the same thing.
It might be more helpful—you might see a difference between failing.root and passing.root—if you look at the raw quantities on that are stored in the file, the members
of the Model_TProfile_v*
.
The TH1D base class, which has values
(direct bin contents):
uproot5/src/uproot/models/TH.py
Lines 3881 to 3891 in 21735bf
fBinEntries
, which is another per-bin quantity:
uproot5/src/uproot/models/TH.py
Lines 3892 to 3894 in 21735bf
and fBinSumw2
:
uproot5/src/uproot/models/TH.py
Lines 3902 to 3904 in 21735bf
The way that the values
and errors
of the profile are computed from these raw quantities is not at all trivial:
uproot5/src/uproot/behaviors/TProfile.py
Lines 63 to 129 in 21735bf
Thanks for the pointers Jim! I had missed #1148 - apologies.
After #1000, I did some validation with multiple files and it seemed to be working on individual TProfiles, so I was a bit surprised that some files work while others don't. However, I now see your point that the calculation is quite involved, which may explain the differences. I'll plan to dig into this further when I have some free moments