Difference in output when running via Trasformers.js and when hosting on Huggingface

Question

Difference in output when running via Trasformers.js and when hosting on Huggingface

jtmuller5 opened this issue 4 months ago · comments

I created an application that uses the UAE-large-V1 model inside Transformers.js and was able to embed sentences in a browser without issues. The model would return a single vector for a single input:

extractor = await pipeline("feature-extraction", "WhereIsAI/UAE-Large-V1", {
      quantized: true,
});

let result = await extractor(text, { pooling: "mean", normalize: true });

When I hosted the model on Huggingface using their inference endpoint solution, it no longer works as expected. Instead of returning a single vector, it returns a variable length of 1024 dimension vectors.

Sample input:

{
   "inputs":  "Where are you"
}

This returns a list of lists of lists of numbers.

Is there a way to make hosted model return a single vector? And why does the the model act differently based on where it's hosted?

Sean · Answer 1 · Thu Mar 14 2024 14:25:12 GMT+0800 (China Standard Time)

It is strange. It should return a single vector because you have specified the mean pooling.

You could ask for help in the Transformers.js project because I am unfamiliar with it. Sorry for this.