opendistro-for-elasticsearch / k-NN

🆕 A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro.

Home Page:https://opendistro.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Re-write adding footer logic to prevent unnecessary file copying

jmazanec15 opened this issue · comments

In order for Lucene to be able to handle the ".hnsw" files correctly, we need to add a footer to the end of them after the graphs are created. Originally, we just copy the data from a temporary file with the graph to a Lucene OutputIndex and then write the footer here.

This copy may not be necessary if we add the footer to the file ourselves:

// Manually write footer
OutputStream os = Files.newOutputStream(Paths.get(indexPath), StandardOpenOption.APPEND);
os.write(FOOTER_MAGIC);
os.write(0);

long value = state.directory.openChecksumInput(hnswFileName, state.context).getChecksum();
if ((value & 0xFFFFFFFF00000000L) != 0) {
    throw new IllegalStateException("Illegal CRC-32 checksum: " + value + " (resource=" + os + ")");
}
os.write((int) (value >> 32));
os.write((int) (value));

This could potentially save time during index/merge operations. How much time would need to be checked via testing. Additionally, we would need to test to make sure this is fault tolerant and does not produce any corruptions.