Uproot.dask fails to open files with names like `myfile.root.1`
gordonwatts opened this issue · comments
Reproducing:
You need a small root file - like myfile.root
. I'm going to assume the tree in myfile.root
you want to open is called mytree
:
- rename
myfile.root
tomyfile.root.1
- In python, run:
import uproot
uproot.dask('myfile.root.1:mytree') - You'll get file-not-found exception.
Why would you name a file like this?
The ATLAS production system often names files with the .1
for whatever reason. As a result, I often find myself accessing files with names like that.
Workaround
Use the dictionary specification method: uproot.dask({ 'myfile.root.1': 'mytree'})
@gordonwatts thanks for reporting this! I suspect this will follow from recent changes to our file name handling. @lobis any clues? :)
Yes, this is expected behaviour that was added at some point in the 5.2.0 release. There should be a mention in the release notes but I haven't checked (at least there was a PR with this).
We chose to only support files ending in .root
when the file:object
syntax is used. We chose to do this because it was not possible to support the same kind of complex url-chain patterns that fsspec supports if we had to also support the file:object
syntax (it may be possible but very complex and prone to error). In clonclusion: file:object
syntax won't work if the files does not end in .root
and this is intended.
You can always use the dict
syntax ({"file.root.1": "object"}
) to achieve the same effect (I actually prefer this) and this will work regardless of the file extension.
That's right: the colon syntax has been hard to maintain, so Uproot 5.2.x simplified it. I used to keep a list of Issues and Discussions about it, but it's more than a dozen now. Page 19 of this talk shows a screenshot of all those issues and an analysis of user code, which demonstrates that people do use it and we can't get rid of it.
So now we only support path/in/filesystem.root:path/inside/file
if the filesystem name ends in .root
. If it doesn't, that's what the {"path/in/filesystem.root": "path/inside/file"}
syntax is for—it's not a workaround, it's the intended use.
I'm going to make this a Discussion because it's not a work-item but it would be useful for others to (hopefully) find if they run into it.