Tron404 / InterviewNLP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

InterviewNLP

This Jupyter notebook ("main.py") is entirely plug-in and use. It only requires the presence of "data.zip" in the current working directory. Everything that was developed in this repository uses Python 3.11.5, though any Python version from the 3.11.x family should work. There is a requirements.txt file as well that can be used to install all of the required modules in a python virtual environment or conda environment. The implementation can be summarized as follows:

Four models + baseline

  • Jaccard for lexical similarity (baseline)
  • TF-IDF
  • BERT embeddings with pooling
  • Sentence-trained DistilRoBERTa (from sentence_transformers)
  • Doc2vec

Other models were also considered, but were discarded due to low peformance compared to the chosen approaches:

  • Word2vec with pooling
  • BERT embbdings with pooling and weighted by TF-IDF scores
  • One-hot encoding BoW

About

License:MIT License


Languages

Language:Jupyter Notebook 98.8%Language:Python 1.1%Language:Shell 0.1%