the candidates parameter

Question

the candidates parameter

Hossein-1991 opened this issue a year ago · comments

Hossein Salahshoor Gavalan commented a year ago

Hi,
I have two questions regarding the candidate parameter:

Where can I find explanations about this exciting parameter?! I have no idea how does it affect the results!
I have a long list of words, can I feed them as a dictionary to the parameter?

Thanks

Maarten Grootendorst · Answer 1 · Wed Jun 21 2023 17:07:45 GMT+0800 (China Standard Time)

The docstrings themselves should give you a start. Other than that, there is a section in the documentation where you can find more information.
You should pass candidate words as a list of words and not a dictionary but other than that it should work.

Hossein Salahshoor Gavalan · Answer 2 · Thu Jun 22 2023 00:34:32 GMT+0800 (China Standard Time)

Thank you for your answer.

Actually, I compared the output of keybert when and without candidates. They were exactly the same! why is that?
On the other hand, the seed_words parameter generates different outputs. What is the difference between candidates and seed_words?

Maarten Grootendorst · Answer 3 · Thu Jun 22 2023 22:09:28 GMT+0800 (China Standard Time)

Actually, I compared the output of keybert when and without candidates. They were exactly the same! why is that?

That is difficult to say without seeing the code but it might be as a result of the candidates being the same as the text's individual words. Could you share your example?

What is the difference between candidates and seed_words?

Candidates are the set of words from which the keywords are chosen. In contrast, seed words simply guide the extraction of words towards the input seed word but they do not necessarily have to be chosen. You can find more about that in the documentation.

Hossein Salahshoor Gavalan · Answer 4 · Mon Jun 26 2023 15:53:00 GMT+0800 (China Standard Time)

That is difficult to say without seeing the code but it might be as a result of the candidates being the same as the text's individual words. Could you share your example?

Yes! I chose keywords from the text, therefore, they are the same as the text's individual words. Even in the documentation, you have used Rake to extract keywords from the text itself (so they are the same too).
Apart from that, I think setting seed_keywords or candidates work well when I want to extract words (I mean ngram = (1,1)). It doesn't work well when ngram = (5,5). Am I right?

Maarten Grootendorst · Answer 5 · Mon Jun 26 2023 16:50:35 GMT+0800 (China Standard Time)

It doesn't work well when ngram = (5,5). Am I right?

That depends on whether the seed_keywords or candidates also have an n-gram of 5. If you are looking for individual words, then n-grams of 1 work best.