byrayhana / Turkish-tweets-sentiment-analysis

Sentiment analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sentiment Analysis on Turkish Tweets

Dependencies

    pandas
    re
    numpy
    matplotlib
    seaborn
    tensorflow
    scikit-learn
  

Data

[Kaggle/Turkish Tweets Dataset](https://www.kaggle.com/datasets/anil1055/turkish-tweet-dataset) The dataset contains tweets in Turkish along with their corresponding labels.

Data Preprocessing

The code performs several preprocessing steps on the text data before training the models.

  1. Data Exploration: The code displays a count plot of the sentiment labels to visualize the distribution of sentiments in the dataset.
  2. Label Mapping: The sentiment labels are mapped to numerical values for model training.
  3. Text Cleaning: The code defines a function to clean the text by removing unwanted patterns and special characters.
  4. Lowercasing: The text is converted to lowercase to ensure consistent tokenization.
  5. Stopword Removal: Turkish stopwords are removed from the text using the nltk library.

Vectorization

The code performs vectorization on the preprocessed text data using TF-IDF vectorization and count vectorization.

Machine Learning Models

The code trains and evaluates two machine learning models: Naive Bayes and Support Vector Machine (SVM).

Naive Bayes

The code uses the Bernoulli Naive Bayes classifier from scikit-learn to train the Naive Bayes model.

Support Vector Machine (SVM)

The code uses the SVM classifier from scikit-learn to train the SVM model.

Deep Learning Models

The code trains and evaluates two deep learning models: CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory).

CNN

The code defines a CNN model using the Keras API.

LSTM

The code defines an LSTM model using the Keras API.

Evaluation

The code evaluates the trained models on the test data and computes accuracy scores for each model. It also visualizes the training and validation loss and accuracy for the deep learning models.

Testing

The code includes a function to test the trained models on new tweets.

Example Usage

To test the trained models on a new tweet, you can call the function.

About

Sentiment analysis


Languages

Language:Jupyter Notebook 100.0%