JessDataNLP / NLP-Exercises

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NLP-Exercises

1. Movie Recommendation System using NLP and Cosine Similarity Algorithm

๐ŸŽฅ Overview: Build a movie recommendation system that suggests movies to users based on their search query, leveraging NLP techniques and the Cosine Similarity algorithm.

๐Ÿ“ˆ Data Collection:
    Source dataset containing movie titles, plot summaries, etc.

๐Ÿ”ง Data Preprocessing:
    Clean and prepare data by handling missing values, converting text to lowercase, removing stop words, and applying stemming.

๐Ÿ“Š Feature Extraction:
    Utilize CountVectorizer and TfidfTransformer from scikit-learn for text data feature extraction.

๐Ÿงฎ Similarity Calculation:
    Employ the cosine similarity algorithm to measure the similarity between movie plot vectors.

๐ŸŽฌ Recommendation:
    Generate movie recommendations based on similarity scores and user queries.

๐Ÿ›  Tools & Libraries: Python, pandas, numpy, scikit-learn, nltk.

๐ŸŽฏ Outcome: A movie recommendation system that suggests movies based on user preferences.

2. Hate Speech Detection in Tweets using Embeddings and Logistic Regression

๐Ÿ” Overview: Classify tweets into 'normal' and 'hate speech' categories using embeddings and logistic regression.

๐Ÿ“ˆ Data Preparation:
    Aggregate and process data from CSV files containing normal and hate speech tweets.

๐Ÿ”ง Text Preprocessing:
    Clean tweets by removing special characters, tokenizing text, removing stopwords, and applying lemmatization.

๐Ÿ“Š Feature Extraction:
    Implement GloVe vectors for transforming text data into numerical format.

๐Ÿค– Model Training:
    Split data into training and testing sets.
    Train a Logistic Regression model, optimizing parameters with GridSearch.

๐Ÿ“ Evaluation:
    Assess model performance using accuracy, precision, recall, F1-score, and confusion matrix.

๐Ÿ›  Tools & Libraries: Python, pandas, scikit-learn, nltk, GloVe.

๐ŸŽฏ Outcome: A robust system capable of accurately identifying hate speech in tweets.

About


Languages

Language:Jupyter Notebook 100.0%