miguel-kjh / Analysis-of-tweets

Sentiment analysis of airline tweets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Analysis-of-tweets

In this repository there are different models that analyze the opinion left by travelers on twitter. The data has been taken from a competition in Kaggle carried out by a Spanish areoline. The data has been processed and various techniques have been tried for its processing.

Machine Learning

  • Bag of words(TF-idf)
  • Random Forest
  • GuassianNB
  • XGBoost

Deep Learning

  • Word Embedding(Glove)
  • CNN with Kernel = 1: this is a video where explain this technique.
  • Fast-Text: a simple and efficient model for text classification.
  • BETO: the model bert trained for spanish.
  • GRUs: Gated recurrent units.

Results

All models have undergone a fine tuning process to get the best performance from them.

balanced Data

Figure 1: Results of the experiment for a balanced Dataset

Unbalanced Data

Figure 2: Results of the experiment for a unbalanced Dataset

Conclusion

As can be seen in the figures, the connectionist approach (deep learning) generates better results for both datasets, however, using balanced data, the models manage to reach 80% accuracy.CNN and Fast-text are fast and effective methods, however, despite being more powerful transofmers fall below the previous two methods. I suppose that the reason is because being a model designed for large volumes of data, with few data as is the case, only 7000 samples, these models give good results but it is not as impressive as in other applications.

Technologies and Libraries

About

Sentiment analysis of airline tweets


Languages

Language:Jupyter Notebook 100.0%