lightsamurai/aphasia

word2vec semantics nlp python clustering-aphasic-patients aphasia

Aim of this work is to automatically rank the severeness of anomic aphasia production by applying Word2Vec tasks to our corpus, manually collected from the Aphasia Bank online database (https://aphasia.talkbank.org/) from diagnostic protocols.

We propose two ways to accomplish this task, one very simple and one more elaborated.

The simple approach's code is named aphasia_automatic_diagnosing.ipynb. The script takes a 3-columns input_EN.csv file as input, where the first column is the patient-ID, whereas the second and third column are target/response word pairs. The target is the word which exactly describes the scene (e.g. "ball") and the response is the word the patient produces (e.g. "sphere"). It returns the mean score of all the word pairs for each patients, as a final score for his/her performance. This algorithm works for each language, a suited language model has to be uploaded. The cosine similarity task is ran by the built-in wv.similarity function of Word2Vec, which takes as input our word pairs and gives as output their cosine similarity. More details on the project: https://drive.google.com/file/d/1IQ8PDOVlTTNE6CscvI70yfJL8G-CYuM5/view?usp=sharing

The other approach's materials are to find in the test folder which contains the main.py script to run the code. It computes the scores of single patients, giving as output the list of all patients' scores and the final score for that patient. This is performed while both comparing the w2v similarity scores with the MEN similarity scores dataset but also taking into consideration the difficulty for a word to be produced (i.e. a more common word should be easier to recall than an infrequent one).

About

Clustering aphasic patients' linguistic production by computational means (Word2vec)

word2vec semantics nlp python clustering-aphasic-patients aphasia

Languages

Language:Jupyter Notebook 80.3%Language:Python 19.7%