MaartenGr / KeyBERT

Minimal keyword extraction with BERT

Home Page:https://MaartenGr.github.io/KeyBERT/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

the candidates parameter

Hossein-1991 opened this issue · comments

Hi,
I have two questions regarding the candidate parameter:

  1. Where can I find explanations about this exciting parameter?! I have no idea how does it affect the results!
  2. I have a long list of words, can I feed them as a dictionary to the parameter?

Thanks

  1. The docstrings themselves should give you a start. Other than that, there is a section in the documentation where you can find more information.
  2. You should pass candidate words as a list of words and not a dictionary but other than that it should work.

Thank you for your answer.

Actually, I compared the output of keybert when and without candidates. They were exactly the same! why is that?
On the other hand, the seed_words parameter generates different outputs. What is the difference between candidates and seed_words?

Actually, I compared the output of keybert when and without candidates. They were exactly the same! why is that?

That is difficult to say without seeing the code but it might be as a result of the candidates being the same as the text's individual words. Could you share your example?

What is the difference between candidates and seed_words?

Candidates are the set of words from which the keywords are chosen. In contrast, seed words simply guide the extraction of words towards the input seed word but they do not necessarily have to be chosen. You can find more about that in the documentation.

That is difficult to say without seeing the code but it might be as a result of the candidates being the same as the text's individual words. Could you share your example?

Yes! I chose keywords from the text, therefore, they are the same as the text's individual words. Even in the documentation, you have used Rake to extract keywords from the text itself (so they are the same too).
Apart from that, I think setting seed_keywords or candidates work well when I want to extract words (I mean ngram = (1,1)). It doesn't work well when ngram = (5,5). Am I right?

It doesn't work well when ngram = (5,5). Am I right?

That depends on whether the seed_keywords or candidates also have an n-gram of 5. If you are looking for individual words, then n-grams of 1 work best.