This is optional, and only needed if you want to download your own model and convert it to TorchScript
format. If you just want to use existing model I already converted, check this section.
Install Miniconda or Miniforge (mamba).
mamba create -n embedder python=3.11
mamba activate embedder
mamba install torchserve torch-model-archiver torch-workflow-archiver -c pytorch
pip3 install torch torchvision torchaudio
You can replace bge-m3 with any huggingface model you want to use. You should also modify src/inference/embed_handler.py
to adapt the particular model you are hosting.
mkdir models
cd models
# Example: download baai/bge-m3
git lfs install
git clone https://huggingface.co/BAAI/bge-m3
# model will be saved to models/bge-m3
This script will convert the Pytorch model you downloaded at models/bge-m3
into a TorchScript
format for serving.
scripts/create_torch_serve_archive.sh
This will pack the TorchScript
model into a docker image.
docker build -t ghcr.io/xingyaoww/torchserve-embedder:0.10.0-gpu-bge-m3 .
If you just want to use bge-m3
and don't want to manually build it, you can download the image i've built:
docker pull ghcr.io/xingyaoww/torchserve-embedder:0.10.0-gpu-bge-m3
# If only want to expose GPU 0
docker run -it --rm --gpus '"device=0"' -p 8080:8080 ghcr.io/xingyaoww/torchserve-embedder:0.10.0-gpu-bge-m3
You can get the embedding of any sentence by sending over RESTful request.
curl --location 'http://127.0.0.1:8080/predictions/model' \
--header 'Content-Type: application/json' \
--data '{
"input": ["how are you?", "who are you?"]
}'
If you see something like the followings, it means your embedding deployment is successful!
[
[
-0.004832866135984659,
0.027840038761496544,
-0.04115190729498863,
-0.038011349737644196,
0.00468627642840147,
-0.07148568332195282,
0.024210147559642792,
-0.0008565349853597581,
0.027191104367375374,
-0.002336308127269149,
-0.01824376732110977,
0.029580133035779,
...
],
[
...
]
]
Start qdrant instance.
docker pull qdrant/qdrant
mkdir -p data/qdrant/data
mkdir -p data/qdrant/snapshots
docker run -p 6333:6333 \
-v $(pwd)/data/qdrant/data:/qdrant/storage \
-v $(pwd)/data/qdrant/snapshots:/qdrant/snapshots \
qdrant/qdrant
Access qdrant instance in Python:
pip install qdrant-client
Check scripts/notebooks/encode_large_scale_data.ipynb
for an example of how to encode large-scale dataset into qdrant for query and evaluation.