TPU Index
TPU Index is a package for fast similarity search over large collections of high dimension vectors on TPUs. This package was built to support our project that we developed for https://tfworld.devpost.com/.
Uses:
- Dealing with a large number of vectors that do not fit on a CPU. TPU v2 has 8x8=64 gbs. TPU v3 has 16x8=128 gbs.
- Speed up similarity searches. On a colab TPU v2, a single cos similairty search of 19.5 million vectors of dimension 512 takes ~1.017 seconds.
Link to our project: https://devpost.com/software/naturallanguagerecommendations
Installation
!pip install tpu-index
Basic usage
from tpu_index import TPUIndex
index = TPUIndex(num_tpu_cores=8)
index.create_index(vectors) # vectors = numpy array, shape == [None, None]
...
D, I = index.search(xq, distance_metric='cosine', top_k=5)
For large numbers of vectors that do not fit on the CPU, add them in chunks
index.create_index(vectorsChunk1) # vectors = numpy array, shape == [None, None]
for file in files:
vectorChunk = np.load(file)
index.append_index(vectorChunk)
# Now perform search
D, I = index.search(xq, distance_metric='cosine', top_k=5)
ToDo:
- Add more distance metrics
- Optional GPU support