handicap common words?: not just one but two links to "interpretation"
holtzermann17 opened this issue · comments
http://planetmath.org/node/87431#comment-19465
... This makes me think that we should give words a negative relevance score in NNexus based on how common they are in non-mathematical texts. Here is a list of the most common 5000 words. Some of the words on the list include:
- group
- interpretation
- instance
- structure
- field
- contain
- concept
- collection
- ...
I'm not saying that we should never link these words, but maybe we could give them a "handicap" that could be overturned by other evidence (e.g. MSC data).
I will move this issue to the 3.0 milestone. For the June release I will only look into handicapping MSC categories that are too specific.
Commonality in natural language is another good indication that a word is not likely to be a term, true, will think of it a bit later in the summer.