Create a golden set of "trivial/novel" pairs
kudkudak opened this issue · comments
To make our considerations concrete, and see how much our MaxSim is lying please let's do the following:
Create a set of K paris of triplets with label if this is novel or not fact in your opinion
e.g.:
(frog, isan, anima), (cat, isan, animal) -> trivial/novel
Then we can ask Jackie, @Mnuke and @arianhosseini during next meeting if we all agree about triviality or novelty of given pair of triplets
Done on my part, I think I converged to the idea that syntagmatic similarity is the way to go
We agreed that syntagmatic similarity is a good first-order approximation of what we mean by similarity. Two words are syntagmatically similar if one can swap the other and sentence is syntactically similar. The more syntagmatically similar words are, the more likely the semantic meaning will be preserved.
This is my intuitive understanding (see for e.g. http://visual-memory.co.uk/daniel/Documents/S4B/sem03.html), but I am not sure if this is fully correct.