Add support for combined image and text embeddings using CLIP
prmaxim opened this issue · comments
Description
We use CLIP for product recommendations in e-commerce. By generating two vectors (image + name) and then adding the concatenated result to the TS embedding field, we get more accurate recommendations than with image embedding alone.
The CLIP API allows requests for multiple fields and returns an array of embeddings back:
[[image embedding], [text embedding]]
TS now natively supports CLIP for image embeddings, but doesn't allow to create embeddings from multiple fields.
Note: the issue #1291 looks broader and covers this specific issue of combining CLIP embeddings.
Steps to reproduce
Create a collection with an image and text fields:
{
"name": "Images",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "image",
"type": "image",
"store": false
},
{
"name": "embedding",
"type": "float[]",
"embed": {
"from": [
"image",
"name"
],
"model_config": {
"model_name": "ts/clip-vit-b-p32"
}
}
}
]
}
Actual Behavior
Error: Only one field can be used in the embed.from
property of an embed field when embedding from an image field.
Metadata
Typesense Version: 0.26.0.rc58