meghdadFar / SDMA

Python implementation of Substitution-driven Measures of Association

Home Page:https://archive-ouverte.unige.ch/unige:96989

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Substitution-driven Measures of Association (SDMAs) for extracting collocations

SDMAs can be used as an alternative to measures such as PMI and Chi-squared in order to identify collocations in a corpus of text. However, unlike PMI and other purely statistical measures that are blind about the meaning of words, SDMAs measure the statistical association by taking into account the degree of semantic non-substitutability of sequences of words. Non-Substitutability is a Linguistic test that measures the fixedness of a phrase. SDMAs can be used to identify collocations and it has been shown that it can considerably outperform association measures such as Pointwise Mutual Information. You can read more about the theory behind this measure in this Jupyter notebook.

Applications

Similar to PMI, SDMAs can be used to identify collocations or multiword expressions.

Usage

About

Python implementation of Substitution-driven Measures of Association

https://archive-ouverte.unige.ch/unige:96989


Languages

Language:Python 52.8%Language:Jupyter Notebook 47.2%