Data Preparation: Imported and preprocessed a dataset containing news articles with labels indicating whether they were fake or real. This included handling missing values and normalizing text data.
Text Preprocessing: Implemented extensive text cleaning procedures such as removing mentions, URLs, emojis, numbers, and punctuation. Applied techniques like tokenization, stopword removal, and stemming to prepare the text for analysis.
Feature Extraction: Utilized methods like TF-IDF to convert text data into numerical features suitable for machine learning models.
Model Training: Trained logistic regression models using the glmnet package. Evaluated model performance with metrics such as accuracy, precision, recall, F1-score, and ROC curves.
Visualization: Created visualizations including word clouds, text length distribution, and confusion matrix heatmaps to gain insights into the data and model performance.
Evaluation: Employed various techniques to evaluate model performance, including plotting ROC curves and generating confusion matrices.