scikit-hep / uproot5

ROOT I/O in pure Python and NumPy.

Home Page:https://uproot.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Uproot.dask fails to open files with names like `myfile.root.1`

gordonwatts opened this issue · comments

Reproducing:

You need a small root file - like myfile.root. I'm going to assume the tree in myfile.root you want to open is called mytree:

  1. rename myfile.root to myfile.root.1
  2. In python, run:
    import uproot
    uproot.dask('myfile.root.1:mytree')
  3. You'll get file-not-found exception.

Why would you name a file like this?

The ATLAS production system often names files with the .1 for whatever reason. As a result, I often find myself accessing files with names like that.

Workaround

Use the dictionary specification method: uproot.dask({ 'myfile.root.1': 'mytree'})

@gordonwatts thanks for reporting this! I suspect this will follow from recent changes to our file name handling. @lobis any clues? :)

Yes, this is expected behaviour that was added at some point in the 5.2.0 release. There should be a mention in the release notes but I haven't checked (at least there was a PR with this).

We chose to only support files ending in .root when the file:object syntax is used. We chose to do this because it was not possible to support the same kind of complex url-chain patterns that fsspec supports if we had to also support the file:object syntax (it may be possible but very complex and prone to error). In clonclusion: file:object syntax won't work if the files does not end in .root and this is intended.

You can always use the dict syntax ({"file.root.1": "object"}) to achieve the same effect (I actually prefer this) and this will work regardless of the file extension.

That's right: the colon syntax has been hard to maintain, so Uproot 5.2.x simplified it. I used to keep a list of Issues and Discussions about it, but it's more than a dozen now. Page 19 of this talk shows a screenshot of all those issues and an analysis of user code, which demonstrates that people do use it and we can't get rid of it.

So now we only support path/in/filesystem.root:path/inside/file if the filesystem name ends in .root. If it doesn't, that's what the {"path/in/filesystem.root": "path/inside/file"} syntax is for—it's not a workaround, it's the intended use.

I'm going to make this a Discussion because it's not a work-item but it would be useful for others to (hopefully) find if they run into it.