memgonzales / semantle-word-embeddings

Recreation of Semantle (a word guessing game that gives the semantic similarity to the secret word) using three pretrained word embeddings: (1) word2vec, (2) GloVe, and (3) fastText

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Recreating Semantle with Word Embeddings

badge badge-python NumPy

This project attempts to recreate a version of the game Semantle, a variant of the five-letter word guessing game Wordle that gives the semantic similarity of the player's guess to the secret word of the day. Our version of Semantle allows the player to choose from the following pretrained word embeddings:

All the scripts are placed inside a Jupyter notebook, which also includes a detailed write-up covering the following:

  • Design decisions in the implementation of the program
  • Walkthrough of the implementation of the program
  • Comparative analysis of selected word embeddings in the context of the program
  • Insights on vector semantics, including:
    • Possible applications outside natural language processing (e.g., vectorizing protein sequences)
    • Ethical issues and latent biases in word embeddings

This notebook was created using Google Colab and invokes commands such as gdown and wget. The memory requirement of loading pretrained word embeddings may also be heavy for some local machines. Therefore, we recommend running the notebook on Colab.

This is a major course output in an introduction to natural language processing class under Mr. Edward P. Tighe of the Department of Software Technology, De La Salle University.

Built Using

This project is a Jupyter notebook, with the following Python libraries and modules used:

Library/Module Description License
gensim Provides functions for training vector embeddings, topic modelling, document indexing, and similarity retrieval with large corpora GNU Lesser General Public License v2.1
regex Provides additional functionality over the standard re module while maintaining backwards-compatibility Apache License 2.0
numpy Provides a multidimensional array object, various derived objects, and an assortment of routines for fast operations on arrays BSD 3-Clause "New" or "Revised" License
io Provides Python's main facilities for dealing with various types of I/O Python Software Foundation License
random Provides functions for generating pseudo-random numbers with various common distributions Python Software Foundation License

The descriptions are taken from their respective websites.

Authors

About

Recreation of Semantle (a word guessing game that gives the semantic similarity to the secret word) using three pretrained word embeddings: (1) word2vec, (2) GloVe, and (3) fastText


Languages

Language:Jupyter Notebook 100.0%