Difference in output when running via Trasformers.js and when hosting on Huggingface
jtmuller5 opened this issue · comments
I created an application that uses the UAE-large-V1 model inside Transformers.js and was able to embed sentences in a browser without issues. The model would return a single vector for a single input:
extractor = await pipeline("feature-extraction", "WhereIsAI/UAE-Large-V1", {
quantized: true,
});
let result = await extractor(text, { pooling: "mean", normalize: true });
When I hosted the model on Huggingface using their inference endpoint solution, it no longer works as expected. Instead of returning a single vector, it returns a variable length of 1024 dimension vectors.
Sample input:
{
"inputs": "Where are you"
}
This returns a list of lists of lists of numbers.
Is there a way to make hosted model return a single vector? And why does the the model act differently based on where it's hosted?
It is strange. It should return a single vector because you have specified the mean
pooling.
You could ask for help in the Transformers.js project because I am unfamiliar with it. Sorry for this.