file descriptor leak on `index.build_index`
drbh opened this issue · comments
calling index.build_index
successively seems to cause a file descriptor leak.
When calling insert
and build
many times (~120) some file is not closed and this causes ngt to crash. Below is the output of a small test that inserts and builds.
...
insert: 113
num_files_open: 292
insert: 114
num_files_open: 294
insert: 115
num_files_open: 296
Traceback (most recent call last):
File "/Users/drbh/Projects/ngt-rs/pytmp/main.py", line 31, in <module>
num_files_open = get_number_of_open_files_by_pid(pid)
File "/Users/drbh/Projects/ngt-rs/pytmp/main.py", line 8, in get_number_of_open_files_by_pid
stream = os.popen(cmd)
File "/usr/local/Cellar/python@3.9/3.9.17/Frameworks/Python.framework/Versions/3.9/lib/python3.9/os.py", line 983, in popen
proc = subprocess.Popen(cmd,
File "/usr/local/Cellar/python@3.9/3.9.17/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/Cellar/python@3.9/3.9.17/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 1736, in _execute_child
errpipe_read, errpipe_write = os.pipe()
OSError: [Errno 24] Too many open files
This can be reproduced with the following script
import ngtpy
import random
import os
import sys
def get_number_of_open_files_by_pid(pid):
cmd = "lsof -p " + str(pid) + " | wc -l"
stream = os.popen(cmd)
output = stream.read()
return int(output)
dim = 10
nb = 1_000
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
ngtpy.create(b"tmp", dim)
index = ngtpy.Index(b"tmp")
# print current process pid
pid = os.getpid()
print("pid: " + str(pid))
for i in range(0, nb):
# do insert build_index save
index.insert(vectors[i])
index.build_index()
index.save()
# print number of open files
num_files_open = get_number_of_open_files_by_pid(pid)
print("\ninsert:\t\t" + str(i))
print("num_files_open:\t" + str(num_files_open))
Additionally this can also be reproduced in ngt-rs
here: lerouxrgd/ngt-rs#12
I believe this is a file descriptor issue because running lsof
after each build_index
shows a growing number of open files. Upon further inspection these files are all dev/null
however I am not sure where this file is opened and why it is not closed.
Please let me know if I can provide any more information! Thank you for the awesome project!
I have released v2.0.14 which fixes this issue. I really appreciate your sample source code, which made it easy for me to find the cause of this issue.
BTW, I know this sample source code is to reproduce this issue, but just to be sure, there is no need to call build_index and save for every insertion. The functions can be called only once at the end of the insertion as follows.
for i in range(0, nb):
index.insert(vectors[i])
index.build_index()
index.save()