pandas re numpy matplotlib seaborn tensorflow scikit-learn
[Kaggle/Turkish Tweets Dataset](https://www.kaggle.com/datasets/anil1055/turkish-tweet-dataset) The dataset contains tweets in Turkish along with their corresponding labels.
The code performs several preprocessing steps on the text data before training the models.
- Data Exploration: The code displays a count plot of the sentiment labels to visualize the distribution of sentiments in the dataset.
- Label Mapping: The sentiment labels are mapped to numerical values for model training.
- Text Cleaning: The code defines a function to clean the text by removing unwanted patterns and special characters.
- Lowercasing: The text is converted to lowercase to ensure consistent tokenization.
- Stopword Removal: Turkish stopwords are removed from the text using the nltk library.
The code performs vectorization on the preprocessed text data using TF-IDF vectorization and count vectorization.
The code trains and evaluates two machine learning models: Naive Bayes and Support Vector Machine (SVM).
The code uses the Bernoulli Naive Bayes classifier from scikit-learn to train the Naive Bayes model.
The code uses the SVM classifier from scikit-learn to train the SVM model.
The code trains and evaluates two deep learning models: CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory).
The code defines a CNN model using the Keras API.
The code defines an LSTM model using the Keras API.
The code evaluates the trained models on the test data and computes accuracy scores for each model. It also visualizes the training and validation loss and accuracy for the deep learning models.
The code includes a function to test the trained models on new tweets.
To test the trained models on a new tweet, you can call the function.