[Question] fast search

Question

vince62s opened this issue 2 years ago · comments

Hello,
I am beginning with this API. My use case is as follow:

In a large file, made of 100 millions of lines, I would like to get rid of all lines that have a Jaccard > 0.7 (for instance)
I looped once with MinHask.bulk to store the hashes.
Then I double loop to compare line by line => very slow.
same question with File1 compared to File2.

Is there a faster way to accomplish this ?

Thanks

Vincent Nguyen · Answer 1 · Fri Jan 13 2023 23:08:51 GMT+0800 (China Standard Time)

sorry seems like the same as #188