This Python project focuses on sentiment analysis within social media data, employing advanced Natural Language Processing (NLP) and machine learning techniques to classify textual content as positive, negative, or neutral. The aim is to provide a deeper understanding of public sentiments expressed across social platforms, helping organizations gauge consumer opinions, monitor brand reputation, and identify emerging trends.
- Data Preprocessing: Clean and prepare raw text data for modeling, including tokenization, stop-word removal, and vectorization.
- Model Development: Implement and train machine learning models such as Convolutional Neural Networks (CNN) with regularization and dropout to effectively classify sentiments.
- Hyperparameter Tuning: Use GridSearchCV to optimize model parameters for best performance.
- Evaluation: Assess model performance using metrics like AUC, precision, recall, and F1-score.
- Visualization: Generate word clouds to visualize the most frequent words in positive and negative sentiments.
- Python 3.8+: Primary programming language.
- TensorFlow/Keras: For building and training neural network models.
- Scikit-learn: For traditional machine learning algorithms and data preprocessing.
- NLTK/SpaCy: For natural language processing tasks.
- Pandas & NumPy: For data manipulation and numerical calculations.
- Matplotlib & Seaborn: For data visualization.
- Jupyter Notebook: For interactive code execution and result visualization.
Clone the repository and install the required packages:
git clone git@github.com:agomolka/MAThesisSentimentAnalysis.git
cd MAThesisSentimentAnalysis
pip install -r requirements.txt
- Contributions to the project are welcome! Please fork the repository and submit pull requests to the
main
branch. - For major changes, please open an issue first to discuss what you would like to change.
- Ensure to update tests as appropriate.
- Distributed under the MIT License. See
LICENSE
file for more information.