eugeneyan / eugeneyan-comments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

https://eugeneyan.com/writing/how-to-install-scann-on-mac/

utterances-bot opened this issue · comments

How to Install Google Scalable Nearest Neighbors (ScaNN) on Mac

Step-by-step walkthrough on the environment, compilers, and installation for ScaNN.

https://eugeneyan.com/writing/how-to-install-scann-on-mac/

Thanks for the guide!

If anyone runs into the following error:
Error in fail: Python Configuration Error: Problem getting numpy include path.

then adding the --action_env PYTHON_BIN_PATH=/usr/local/bin/python3 to your compilation should do the trick

Great guide, thanks. I did this on Mac OS 12.5.1, intel. Minor changes made this procedure work:

  1. With homebrew, install bazelisk, not bazel. This is now the preferred method of installing Bazel.
  2. In the scann directory, create a hidden file and name it: .bazelversion
  3. In that file, simply type in: 3.7.2 --> this instructs bazelisk to run Bazel version 3.7.2, instead of the latest one (currently 6.0.0), which does not work with the scann BUILD.bazel file.
  4. I did not have to switch the #include <hash_set> to #include <ext/hash_set>.
  5. Successfully installed scann 1.2.9

PS: installed in Python 3.10.9

Thanks for this! Before I go through the effort of doing this, I need to figure out the memory needs of SCANN. I am looking to query a dataset of shape (~50M, 100), and I'm trying to get this to work on just my laptop. Does SCANN load the entire index into memory (which would explode my 16Gb of RAM) or does it work out of core? I can't seem to find this answer anywhere obvious. Thanks!

Don't know! I've been using it with databases about of about 5M and it's very fast. You'd have to just give it a go and see.

PS: I was also able to install ScaNN on M1, combining the info in this guide and using the patch discussed here: google-research/google-research#1082