Pawan300 / NLP

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repository contains the programs related to NLP.

  • This contain some research paper implementation or some transformers extension of hugging face in Text similarity.

  • I have tried some approaches on the simple dataset which trying to classify the types of text into spam or ham.
    So I have tried mulitple strategy to come up for the embeddings:

    • TF_IDF
    • Word2Vec
    • Doc2Vec

    Then I tried random forest and RNN structure with LSTM.

    Scores I get is:

    Model Precision Recall Accuracy
    TF_IDF + RF 0.99 0.78 0.97
    Word2Vec + RF 0.46 0.24 0.87
    Doc2Vec + RF 0.81 0.35 0.91
    RNN + text_to_sequence 0.99 0.96 0.99

    I also tried to catch some hyperparameter using different methods and libraries :

    Model Time (in min) Accuracy
    Random forest (RF) 2.4 0.97
    Grid Search CV 25.6 0.97
    Pipeline 10.9 0.95
    Skopt 19.3 0.97
    Hyperopt 28:12 0.95
    Optuna 40 0.97

    Optuna is taking more time and giving accuracy which is better than some models.

About


Languages

Language:Jupyter Notebook 97.3%Language:Python 2.7%