Don't use np.tofile/np.fromfile when interacting with fs
hynky1999 opened this issue · comments
Hynek Kydlíček commented
Problem
Numpy requires filesystem to implement fileno
, when using np.tofile/np.fromfile
, however s3fs
doesn't implement fileno
in it's implementation of AbstractFileSystem
.
Since we use np.tofile
in sentence deduplication, when used with s3 for signatures, an error is raised:
io.UnsupportedOperation: fileno
Fix
Use Struct.pack, instead of numpy implementation
Guilherme Penedo commented
Struct is at least one order of magnitude slower. A simpler alternative is to use np.from_buffer while reading the file data directly
Hynek Kydlíček commented
Ahhh, I wasn't aware of speed implications.
Let's go with np.tobytes
/ np.frombuffer
then