Incorrect handling of Unicode keys when creating npz files
cerisola opened this issue · comments
Hi, I am running into issues when using NPZ to create an npz file that uses unicode strings as keys.
Just to be clear, everything works fine when creating the file using Numpy and reading it using NPZ, i.e. this works fine in Python
>>> import numpy as np
>>> np.savez("file.npz", α=1)
>>> D = np.load("file.npz")
>>> print(D["α"])
1
and reading the file in Julia using NPZ also works as expected
julia> using NPZ
julia> D = npzread("file.npz")
Dict{String, Int64} with 1 entry:
"α" => 1
julia> D["α"]
1
However, if I try creating this file from NPZ, while NPZ can read it as expected, it cannot be properly read by Numpy.
Indeed, from the NPZ side:
julia> npzwrite("file.npz", Dict("α" => 1))
julia> D = npzread("file.npz")
Dict{String, Int64} with 1 entry:
"α" => 1
julia> D["α"]
1
everything works fine. However, when I try opening the file with Numpy, while it does load it, the keys are not what I would expect:
>>> D = np.load("file.npz")
>>> print(D["α"])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-17-7d756a0b03cf> in <module>
----> 1 print(D["α"])
/usr/lib/python3.9/site-packages/numpy/lib/npyio.py in __getitem__(self, key)
258 return self.zip.read(key)
259 else:
--> 260 raise KeyError("%s is not a file in the archive" % key)
261
262
KeyError: 'α is not a file in the archive'
Indeed if I print the keys of the loaded file I get some different unicode string:
>>> list(D.keys())
['╬▒']
After digging into the source of the library to try to find the cause of this issue, I am now pretty sure the problem lies within the ZipFile.jl library that NPZ.jl uses to create the zip file. I have now created an issue for the ZipFile.jl project (see fhs/ZipFile.jl#84) to address this problem.