ValueError: cannot compute similarity with no input

Question

ValueError: cannot compute similarity with no input

TechyNilesh opened this issue 3 years ago · comments

Nilesh Verma commented 3 years ago

Hi Team,

I am getting following error while running model fit:

2022-04-08 14:19:04,344 - Lbl2Vec - INFO - Train document and word embeddings
2022-04-08 14:19:09,992 - Lbl2Vec - INFO - Train label embeddings

ValueError Traceback (most recent call last)
in

~/SageMaker/lbl2vec/lbl2vec.py in fit(self)
248 # get doc keys and similarity scores of documents that are similar to
249 # the description keywords
--> 250 self.labels[['doc_keys', 'doc_similarity_scores']] = self.labels['description_keywords'].apply(lambda row: self._get_similar_documents(
251 self.doc2vec_model, row, num_docs=self.num_docs, similarity_threshold=self.similarity_threshold, min_num_docs=self.min_num_docs))
252

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
4211 else:
4212 values = self.astype(object)._values
-> 4213 mapped = lib.map_infer(values, f, convert=convert_dtype)
4214
4215 if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

~/SageMaker/lbl2vec/lbl2vec.py in (row)
249 # the description keywords
250 self.labels[['doc_keys', 'doc_similarity_scores']] = self.labels['description_keywords'].apply(lambda row: self._get_similar_documents(
--> 251 self.doc2vec_model, row, num_docs=self.num_docs, similarity_threshold=self.similarity_threshold, min_num_docs=self.min_num_docs))
252
253 # validate that documents to calculate label embeddings from are found

~/SageMaker/lbl2vec/lbl2vec.py in _get_similar_documents(self, doc2vec_model, keywords, num_docs, similarity_threshold, min_num_docs)
625 for word in cleaned_keywords_list]
626 similar_docs = doc2vec_model.dv.most_similar(
--> 627 positive=keywordword_vectors, topn=num_docs)
628 except KeyError as error:
629 error.args = (

~/anaconda3/envs/python3/lib/python3.6/site-packages/gensim/models/keyedvectors.py in most_similar(self, positive, negative, topn, clip_start, clip_end, restrict_vocab, indexer)
775 all_keys.add(self.get_index(key))
776 if not mean:
--> 777 raise ValueError("cannot compute similarity with no input")
778 mean = matutils.unitvec(array(mean).mean(axis=0)).astype(REAL)
779

ValueError: cannot compute similarity with no input

Tim Schopf · Answer 1 · Sat Apr 09 2022 01:06:48 GMT+0800 (China Standard Time)

The keywords 'Crack', 'Broken' and 'Breakage' were not learned by the model and therefore unknown to it. Probably those were all keywords for your class but can't be used to compute a label vector because they are unknown. This results in an error.

This could have different reasons. The simplest explanation is that you used capitalized keywords, but the model only knows words that are lowercase. In this case, just convert your keywords to lowercase.

Another explanation could be that the keywords don't appear in your training corpus or have a low frequency. In this case I suggest you try some different keywords or add some more training data that the model can learn those keywords.

Nilesh Verma · Answer 2 · Sat Apr 09 2022 09:08:27 GMT+0800 (China Standard Time)

Is it possible to skip those terms that aren't in the document?

Tim Schopf · Answer 3 · Sat Apr 09 2022 18:54:51 GMT+0800 (China Standard Time)

The unknown keywords are already skipped by default for computing the label vector. But when all keywords are unknown to the model, no keywords are left for label computation. This probably resulted in the error.

ValueError: cannot compute similarity with no input

2022-04-08 14:19:04,344 - Lbl2Vec - INFO - Train document and word embeddings 2022-04-08 14:19:09,992 - Lbl2Vec - INFO - Train label embeddings

2022-04-08 14:19:04,344 - Lbl2Vec - INFO - Train document and word embeddings
2022-04-08 14:19:09,992 - Lbl2Vec - INFO - Train label embeddings