Briiick / NLP-disaster-tweets

Exploring BERT with Kaggle disaster tweets dataset.

Home Page:https://medium.com/towards-data-science/part-1-data-cleaning-does-bert-need-clean-data-6a50c9c6e9fd

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NLP with Disaster Tweets

Predict which Tweets are about real disasters and which ones are not.

Alexander Bricken


Currect Submission Accuracy and Position on Leaderboard: 84.063%, position #71 (although #52 if you subtract cheaters).

Project structure:

├── README.md                     <- The top-level README for developers using this project.
├── data
│   ├── raw                       <- The raw data
│   ├── submissions               <- The final data to be submitted
│
│
├── requirements.txt              <- Requirements for this project.
│
├── utils.py                      <- Utility functions for project.
├── tweet-scraping.py             <- Tweet scraping for more data.
│
├── notebooks                     <- Jupyter notebooks for this project.
│   ├── nlp_disaster_tweets       <- The main Jupyter notebook
│
├── data-dictionary.txt <- Data dictionaries, manuals, and all other explanatory materials.

Data

Raw data source: https://www.kaggle.com/c/nlp-getting-started/overview

Using The Project

Check in the notebooks folder to see the associated exploratory analysis.

If you want to play with it, simply type git clone https://github.com/Briiick/NLP-disaster-tweets.git in your terminal.

References

Natural Language Processing with Disaster Tweets

NLP with Disaster Tweets: EDA, cleaning and BERT

Basics of using pre-trained GloVe Vectors

Cleaning text data with Python

What is tokenization?

BERT Text Classification using Keras

About

Exploring BERT with Kaggle disaster tweets dataset.

https://medium.com/towards-data-science/part-1-data-cleaning-does-bert-need-clean-data-6a50c9c6e9fd


Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%