typesense / typesense

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

Home Page:https://typesense.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for combined image and text embeddings using CLIP

prmaxim opened this issue · comments

commented

Description

We use CLIP for product recommendations in e-commerce. By generating two vectors (image + name) and then adding the concatenated result to the TS embedding field, we get more accurate recommendations than with image embedding alone.

The CLIP API allows requests for multiple fields and returns an array of embeddings back:
[[image embedding], [text embedding]]

TS now natively supports CLIP for image embeddings, but doesn't allow to create embeddings from multiple fields.

Note: the issue #1291 looks broader and covers this specific issue of combining CLIP embeddings.

Steps to reproduce

Create a collection with an image and text fields:

{
  "name": "Images",
  "fields": [
    {
      "name": "name",
      "type": "string"
    },
    {
      "name": "image",
      "type": "image",
      "store": false
    },
    {
      "name": "embedding",
      "type": "float[]",
      "embed": {
        "from": [
          "image",
          "name"
        ],
        "model_config": {
          "model_name": "ts/clip-vit-b-p32"
        }
      }
    }
  ]
}

Actual Behavior

Error: Only one field can be used in the embed.from property of an embed field when embedding from an image field.

Metadata

Typesense Version: 0.26.0.rc58