scikit-hep / uproot5

ROOT I/O in pure Python and NumPy.

Home Page:https://uproot.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem extracting TH1F and TH2F using uproot

gabrielmscampos opened this issue · comments

I'm encountering some problems trying to read offline DQMIO files using uproot.

My goal is to read all the TH1[F,S,D] and TH2[F,S,D] FullName and Value branches.

An example file can be find here (acessible from LXPlus): /afs/cern.ch/user/g/gamoreir/public/A1C97269-5183-456E-83BC-D9A8310FA77D.root.

>>> import uproot
>>> import numpy
>>> import awkward
>>> uproot.__version__
'5.3.2'
>>> numpy.__version__
'1.26.4'
>>> awkward.__version__
'2.6.2'

Run numbers and lumisections are encoded in the Indices TTree is successfully extracted:

>>> import uproot
>>>
>>> f = uproot.open("A1C97269-5183-456E-83BC-D9A8310FA77D.root")
>>>
>>> f["Indices/Run"].array()
<Array [367094, 367094, 367094, ..., 367094, 367094] type='278 * uint32'>
>>>
>>> f["Indices/Lumi"].array()
<Array [1, 1, 1, 1, 1, 1, 2, 2, ..., 0, 0, 0, 0, 0, 0, 0] type='278 * uint32'>
>>> 
>>> f["Indices/Type"].array()
<Array [0, 3, 5, 6, 10, 11, 0, ..., 1, 2, 3, 5, 6, 10, 11] type='278 * uint32'>
>>>
>>> f["Indices/FirstIndex"].array()
<Array [0, 0, 0, 0, 0, ..., 900, 5445, 48960, 10080] type='278 * uint64'>
>>>
>>> f["Indices/LastIndex"].array()
<Array [0, 3508, 19, 120, ..., 1022, 12260, 60447, 11393] type='278 * uint64'>

Using TH1Fs TTree as an example, the FullName branch is successfully extracted:

>>> f["TH1Fs"]["FullName"].array()
<Array [...] type='220351 * string'>

But the Value branch raises the following exceptions using awkward and numpy:

>>> f["TH1Fs"]["Value"].array()
Traceback (most recent call last):
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 2437, in _awkward_check
    interpretation.awkward_form(self.file)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/interpretation/objects.py", line 111, in awkward_form
    return self._model.awkward_form(self._branch.file, context)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/model.py", line 1195, in awkward_form
    return versioned_cls.awkward_form(file, context)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/models/TH.py", line 1854, in awkward_form
    tmp_awkward_form = file.class_named("TH1", 8).awkward_form(file, context)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/models/TH.py", line 886, in awkward_form
    contents["fFunctions"] = file.class_named("TList", "max").awkward_form(
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/model.py", line 686, in awkward_form
    raise uproot.interpretation.objects.CannotBeAwkward(
uproot.interpretation.objects.CannotBeAwkward: TList

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 1799, in array
    _ranges_or_baskets_to_arrays(
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 2993, in _ranges_or_baskets_to_arrays
    branchid_to_branch[cache_key]._awkward_check(interpretation)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 2439, in _awkward_check
    raise ValueError(
ValueError: cannot produce Awkward Arrays for interpretation AsObjects(Model_TH1F) because

    TList

instead, try library="np" rather than library="ak" or globally set uproot.default_library

in file A1C97269-5183-456E-83BC-D9A8310FA77D.root
in object /TH1Fs;1:Value
>>> f["TH1Fs"]["Value"].array(library="np")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 1799, in array
    _ranges_or_baskets_to_arrays(
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 3083, in _ranges_or_baskets_to_arrays
    uproot.source.futures.delayed_raise(*obj)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/source/futures.py", line 38, in delayed_raise
    raise exception_value.with_traceback(traceback)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 3032, in basket_to_array
    basket_arrays[basket.basket_num] = interpretation.basket_array(
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/interpretation/objects.py", line 159, in basket_array
    ).to_numpy()
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/interpretation/objects.py", line 976, in to_numpy
    output[i] = self[i]
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/interpretation/objects.py", line 991, in __getitem__
    return self._model.read(
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/model.py", line 1362, in read
    versioned_cls.read(
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/model.py", line 854, in read
    self.read_members(chunk, cursor, context, file)
  File "<dynamic>", line 19, in read_members
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/model.py", line 1362, in read
    versioned_cls.read(
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/model.py", line 854, in read
    self.read_members(chunk, cursor, context, file)
  File "<dynamic>", line 54, in read_members
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/model.py", line 1362, in read
    versioned_cls.read(
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/model.py", line 854, in read
    self.read_members(chunk, cursor, context, file)
  File "<dynamic>", line 4, in read_members
NotImplementedError: memberwise serialization of Model_TAxis_v10
in file A1C97269-5183-456E-83BC-D9A8310FA77D.root

The same problem exist when trying to read TH2Fs using awkward, differently from TH1Fs it works fine using numpy.

f["TH2Fs"]["Value"].array()
Traceback (most recent call last):
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 2437, in _awkward_check
    interpretation.awkward_form(self.file)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/interpretation/objects.py", line 111, in awkward_form
    return self._model.awkward_form(self._branch.file, context)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/containers.py", line 333, in awkward_form
    return uproot._util.awkward_form(self._model, file, context)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/_util.py", line 558, in awkward_form
    return model.awkward_form(file, context)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/model.py", line 1195, in awkward_form
    return versioned_cls.awkward_form(file, context)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/models/TH.py", line 2661, in awkward_form
    tmp_awkward_form = file.class_named("TH2", 5).awkward_form(file, context)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/models/TH.py", line 1133, in awkward_form
    tmp_awkward_form = file.class_named("TH1", 8).awkward_form(file, context)
  File "<dynamic>", line 261, in awkward_form
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/model.py", line 686, in awkward_form
    raise uproot.interpretation.objects.CannotBeAwkward(
uproot.interpretation.objects.CannotBeAwkward: TList

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 1799, in array
    _ranges_or_baskets_to_arrays(
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 2993, in _ranges_or_baskets_to_arrays
    branchid_to_branch[cache_key]._awkward_check(interpretation)
  File "/home/gamoreir/.cache/pypoetry/virtualenvs/dials-pauNEvGJ-py3.10/lib/python3.10/site-packages/uproot/behaviors/TBranch.py", line 2439, in _awkward_check
    raise ValueError(
ValueError: cannot produce Awkward Arrays for interpretation AsObjects(AsDynamic(model=Model_TH2F)) because

    TList

instead, try library="np" rather than library="ak" or globally set uproot.default_library

in file A1C97269-5183-456E-83BC-D9A8310FA77D.root
in object /TH2Fs;1:Value
>>> u = f["TH2Fs"]["Value"].array(library="np")
>>>

If a solution is not possible with awkward I don't have any problems using numpy only.

It's correct that TH* histograms can't be Awkward Arrays. Awkward Arrays are trees of lists and structs, but TH* histograms are non-tree-like data structures with pointers that can point anywhere, even have self-referential cycles.

>>> import uproot
>>> file = uproot.open("uproot-issue-1190.root")
>>> tree = file["TH2Fs"]
>>> tree.show()
name                 | typename                 | interpretation                
---------------------+--------------------------+-------------------------------
FullName             | std::string              | AsStrings()
Flags                | uint32_t                 | AsDtype('>u4')
Value                | TH2F                     | AsObjects(AsDynamic(model=M...
>>> tree["Value"].interpretation
AsObjects(AsDynamic(model=Model_TH2F))

It can be read with library="np" because this falls back to making dtype=object arrays of arbitrary Python objects, which is what you need for an object this complicated.

>>> tree["Value"].array(library="np", entry_stop=1)
array([<TH2F (version 4) at 0x7b4fa1bd6c80>], dtype=object)
>>> histogram, = tree["Value"].array(library="np", entry_stop=1)
>>> histogram
<TH2F (version 4) at 0x7b4fa1bd6c80>
>>> type(histogram)
<class 'uproot.models.TH.Model_TH2F_v4'>
>>> histogram.values()
array([[262., 258., 277., ..., 252., 333., 327.],
       [336., 313., 275., ..., 317., 319., 380.],
       [303., 361., 296., ..., 268., 387., 387.],
       ...,
       [302., 304., 226., ..., 283., 295., 285.],
       [281., 320., 245., ..., 307., 330., 283.],
       [299., 324., 232., ..., 292., 313.,   0.]], dtype=float32)

As for the "memberwise splitting" error, unfortunately this is an area where Uproot is incomplete and it probably won't be implemented. Issue #38 tracks all of the requests to handle memberwise splitting, but that will be a major project because it's a different format. I don't know why the file-writing process chose to write the TH2F's TAxis in the normal way but the TH1F's TAxis with memberwise splitting; they're both TAxis version 10:

>>> histogram.member("fXaxis")
<TAxis (version 10) at 0x7b4fb5df50f0>
>>> histogram.member("fYaxis")
<TAxis (version 10) at 0x7b4fb5df4be0>

and they're in the same location in the class structure. (TH2* inherit from TH1, and TH1 has fXaxis, fYaxis, and fZaxis, even though only the x axis is needed for a 1-dimensional histogram. When the TH2F is writing its axes with normal serialization, the TH1F is writing the same data members at the same location of the class with memberwise splitting, and since Uproot hasn't implemented readers for memberwise splitting, it's stuck.)

Therefore, I'm going to close this issue but mention it on #38 as another instance in which this came up. I'll also attach your file to this PR so that it will be here forever, in case anyone has time to implement deserialization of the memberwise-split format in Uproot.

This is the ROOT file in two parts; just

cat uproot-issue-1190-part-1.txt uproot-issue-1190-part-2.txt > uproot-issue-1190.root

uproot-issue-1190-part-1.txt
uproot-issue-1190-part-2.txt