๐ฅ Overview: Build a movie recommendation system that suggests movies to users based on their search query, leveraging NLP techniques and the Cosine Similarity algorithm.
๐ Data Collection:
Source dataset containing movie titles, plot summaries, etc.
๐ง Data Preprocessing:
Clean and prepare data by handling missing values, converting text to lowercase, removing stop words, and applying stemming.
๐ Feature Extraction:
Utilize CountVectorizer and TfidfTransformer from scikit-learn for text data feature extraction.
๐งฎ Similarity Calculation:
Employ the cosine similarity algorithm to measure the similarity between movie plot vectors.
๐ฌ Recommendation:
Generate movie recommendations based on similarity scores and user queries.
๐ Tools & Libraries: Python, pandas, numpy, scikit-learn, nltk.
๐ฏ Outcome: A movie recommendation system that suggests movies based on user preferences.
๐ Overview: Classify tweets into 'normal' and 'hate speech' categories using embeddings and logistic regression.
๐ Data Preparation:
Aggregate and process data from CSV files containing normal and hate speech tweets.
๐ง Text Preprocessing:
Clean tweets by removing special characters, tokenizing text, removing stopwords, and applying lemmatization.
๐ Feature Extraction:
Implement GloVe vectors for transforming text data into numerical format.
๐ค Model Training:
Split data into training and testing sets.
Train a Logistic Regression model, optimizing parameters with GridSearch.
๐ Evaluation:
Assess model performance using accuracy, precision, recall, F1-score, and confusion matrix.
๐ Tools & Libraries: Python, pandas, scikit-learn, nltk, GloVe.
๐ฏ Outcome: A robust system capable of accurately identifying hate speech in tweets.