sebischair / Lbl2Vec

Lbl2Vec learns jointly embedded label, document and word vectors to retrieve documents with predefined topics from an unlabeled document corpus.

Home Page:https://wwwmatthes.in.tum.de/pages/naimi84squl1/Lbl2Vec-An-Embedding-based-Approach-for-Unsupervised-Document-Retrieval-on-Predefined-Topics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Localization possible ?

dionmes opened this issue · comments

commented

Does Lbl2Vec work with other languages than English, as in does it create the doc2vec correctly when using it on other languages ?

If you learn a new Lbl2Vec model from scratch, this also trains a Doc2Vec model from scratch internally. This approach is language agnostic. Therefore, you can apply Lbl2Vec to data of any language. However, I cannot predict how well Lbl2Vec actually works in other languages, since we have only evaluated in English so far. The performance most likely depends on the complexity of the language as well as on the preprocessing of the data.

commented

Thanks Tim, makes sense. Gonna look for some Dutch docs 👍