- Extracting tweets from twitter(extractTweet.py)
- Writing those tweets to DataFrame and then a CSV file(Tabular Format)
- Preprocessing Data(preprocess.py) basically it calculates the Frequencies and Relative Frequencies for unique words in tweet and makes the data ready for training
- Applying Naive Bayes algorithm just to check that is there some possibility of classification of tweets based on hype
- Getting the Accuracy(Which is the worst for now atleast xD)
Extracted dataset contains tweets classified into three classes
- Low Hype:0
- Medium Hype:1
- High Hype:2
I took information from IMDB to get movie names and then tweets
- We can also add a feature No._Of_Tweets
- Using RNN or LSTMs for classification which is basically Deep Learning approach so we need humongous amount of data, so 3. working on collecting and preprocessing that efficiently