matteobrv / repro-homonymy-acl21

Code, data and results of a reproducibility experiment for the paper "Exploring the Representation of Word Meanings in Context" by Marcos Garcia.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Re] Word Meanings Representation in Context

This repository hosts the code, data and results of our reproducibility experiment for the ACL 2021 paper Exploring the Representation of Word Meanings in Context: A Case Study on Homonymy and Synonymy by Marcos Garcia. The paper looks at both static and contextualized word embeddings with the goal to assess their ability to adequately represent different lexical‐semantic relations, such as homonymy and synonymy. Our goal is to reproduce the results summarised in Table 4 of the orginal paper and to test the hypothesis formulated by the author on a newly-compiled Italian dataset. The original repository can be reached at https://github.com/marcospln/homonymy_acl21.

Datasets

For our experiment we work with .tsv data-sets of triples in five languages: English, Spanish, Portuguese, Galician and Italian. A triple is a set of three sentences, each containing a target word marked by <b></b> tags. Two target words have the same meaning while the third is an outlier.

Target	POS	Context	Overlap	Sent1	Sent2	Sent3
coach	same|same|same	same|same|same	false|false|false	We're going to the airport by <b>coach</b>.	We're going to the airport by <b>bus</b>.	We're going to the airport by <b>bicycle</b>.

For each .tsv data-set we also need its corresponding .conllu version. These resources are provided in the datasets folder.

Run the Experiment

  1. execute get_fasttext_models.sh to get the fastText models required to succesfully run the experiment;
  2. execute generate_comparisons.sh to generate an embedding for each sentence in each triple and compare them: (emb_sent_1 vs emb_sent_2), (emb_sent_1 vs emb_sent_3) and (emb_sent_2 vs emb_sent_3);
  3. execute evaluate_comparisons.sh to compute the accuracy scores for each language variety.

Results

The outputs of the two scripts, generate_comparisons.sh and evaluate_comparisons.sh, are stored in triples_comparisons and results, respectively. We provide our results in the repro_results folder.

About

Code, data and results of a reproducibility experiment for the paper "Exploring the Representation of Word Meanings in Context" by Marcos Garcia.

License:GNU General Public License v3.0


Languages

Language:Python 87.9%Language:Shell 12.1%