rlleshi / towards_explainable_creativity

Master Thesis related codes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Towards Explainable Creativity: Tackling The Remote Association Test With Knowledge Graphs

TimeFrame: 15.02.2021 - 15.09.2021

Results

Model Accuracy_1 Accuracy_2 Accuracy_3
conceptnet-search-local frat_3_depth_1: 27.08%; frat_3_depth_2: 83.33% frat_2,3_depth_1: 62.5%; frat_2,3_depth_2: 86.98% frat_2_depth_1: 62.5%; frat_2_depth_2: 88.19%
embedding-search-local frat_3_depth_1: 25.0%; frat_top_3_depth_2: 54.2% frat_2,3_depth_1: 62.5%; frat_top_5_depth_2: 64.6% frat_2_depth_1: 37.5%; frat_top_10_depth_2: 70.8%
embedding-search-general frat_top_3: 50.0% frat_top_5: 54.2% frat_top_10: 58.3%
genism-search frat_top_3: 16.7% frat_top_5: 18.8% frat_top_10: 31.2%

The code for this thesis focuses on the following

ConceptNet search local

Given a triple of concepts e.g., (question, reply, solution) and a ground solution e.g., statement, calculate:

  • solutions: list of solutions based on conceptnet using this python library. A solution is a node from three queries that result in the same node for every member of a triple.
  • has_solution: boolean, whether the triple has a solution or not
  • relation: bidirectional relation from each member of a triple to it's solution
  • relation_to_solution: unidirectional relation from each member of a triple to it's solution
  • relation_from_solution: unidirectional relation from the solution to each member of a triple
  • accuracy: accuracy calculated against ground_solution

For frat: look at non-compound concepts (focus of the thesis)

For rat: look at compound concepts

Moreover, model the explanations (relation) according to templates.txt

Embedding search

The Cosine Similarity between two word vectors provides an effective method for reassuring the linguistic or semantic similarity of the correspoinding words. Sometimes, the nearest neighbors according to this metric reveal rare but relevant words that lie outside an average human's vocabulary.

Here all the nodes are checked against the solutions and a cosine distance is noted. Afterwards, solutions are filtered according to a threshold.

Embedding general search

For every noun in the English language (65k taken from WordNet), get the embedding and thereafter find the cosine similarity with our triples. Finally, perform intersection between triple's solutions.

Gensim search

Find the intersection of the queries based on CenceptNet Numberbatch. Find top3, top5, top10 solutions.

In order to use the script you must first download the conceptnet-numberbatch model (link in resources/), unzip the model and convert it using the misc/number_batch_converter.py script.

Libraries & Tools

  1. conceptnet lite: python library for working with conceptnet offline
  • Conceptnet is a semantic network designed to help computers understand the meaning of words that people use
  1. ConceptNet Numberbatch: Set of word embeddings for conceptnet
  2. embeddings: python package that provides pretrained word embeddings for natural language processing and machine learning
  3. gensim: python library for topic modelling, document indexing and similarity retrieval with large corpora. Used in Natural Language Processing and Information Retrieval

About

Master Thesis related codes


Languages

Language:Jupyter Notebook 65.8%Language:Python 32.6%Language:Shell 1.6%