BrianS3 / TweeTERA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TweetERA

TweetERA (Tweet Emotional Response Analysis) was designed to simplify how Twitter data is analyzed. This package will create a MySQL database and load Twitter data to it. It will also perform a sentiment analysis on the tweets, encouraging users to run analyze new data frequently. Simply enter your keyword or phrase and let the package do the rest.

This package uses unsupervised machine learning to understand what words are associated with your keyword or phrase and add to your content search by pulling tweets by these related words. Tweets are pulled from the day before a run is executed up to 25 days prior.

Supervised learning is used to expedite the package runtime. Only a subset of words are analyzed using text2emotion, the rest are predicted with Support Vector Machines via scikit learn.

There are two basic requirements to use TweetERA:

  1. You must have research level access to Twitter API v2 and adhere to its terms and agreements.
  2. You must create an empty MySQL database (details below).

Sample TweetERA report

Getting Started

Installation Requirements

This package is built on a number of python packages. The current release version has conflict with PYPI installer and requires the user to install dependencies.

To install the required packages:

import tweetera as t

t.install_dependencies()

Setting up MySQL

To get started you need to first setup a MySQL database. If you have never done this, check out a this tutorial for more information.

Once you have MySQL server installed, use the following commands to set up a blank database and user for access.

In MySQL command line create user:

create user 'your_user' identified by 'password';

Then set up the database:

Create database your_db;

Grant privileges to user; these are the minimum required to run this package:

grant alter,create,delete,drop,index,insert,select,update, references on your_db.* to 'your_user';

Initialize user credentials

All credentials for this package are stored in an .env file for convenience. This file will be created in your current working directory. If that directory has an existing .env file, this package will overwrite it.

Loading credentials:

import tweetera as t
import os

# Get information about functions
help(t.create_env_variables)

# set up new environmental variables
db_user = ''
db_pass = ''
api_bearer = ''
db_host = ''
database = ''

#you can load
t.create_env_variables(db_user=db_user, db_pass=db_pass, api_bearer=api_bearer, db_host=db_host, database=database)

Once loaded you can change the name of any variable as needed.

t.create_env_variables(database='new_db_name')

Once the .env file is created, use the following code to load your credentials per session.

t.load_env_credentials()

Check your credentials

import os 
import tweetera as t

t.load_env_credentials()

print(os.getenv('mysql_username'))
print(os.getenv('mysql_pass'))
print(os.getenv('db_host'))
print(os.getenv('database'))
print(os.getenv('twitter_bearer'))

Example code

Do a full database load

Want to dive right in and just start analyzing tweets? Run the following:

import tweetera as t

t.load_env_credentials()
t.database_load("hello")

Do several database loads

If you set up multiple databases, you can string together an analysis:

import tweetera as t

databases = ['database_1','database_2','database_3','database_4']
words = ['Iran', 'Nury Martinez', 'California drought', 'Putin']

for i in range(4):

    t.create_env_variables(database = databases[i])
    t.load_env_credentials()
    t.database_load(words[i])

Please note that the generate_report function will overwrite itself and should not be called in a loop to return multiple results. See below for more details.

Creating your database

This package will automatically create a database when you choose to execute database.database_load("keyword"); however you can do this process on your own at any time.

import tweetera as t

t.load_env_credentials()
t.reset_mysql_database()

Partial database loads

If you want to load a single database load, you can. Be advised this will not perform any sentiment analysis.

import tweetera as t

t.load_env_credentials()
t.load_tweets("hello", '2020-01-01', '2020-02-01', 50)

Database connections

If you wish to manually connect to your database and extract data you can do so as follows:

import tweetera as t
import pandas as pd

t.load_env_credentials()

query = "SELECT * FROM TWEET_TEXT;"

df = pd.read_sql_query(query, cnx)

Analyzing your data

Checking Pytrends

This package has a feature that allows you to find out if a specific term or phrase is trending before executing a full run. Calling check_trend will display an analysis from Google Trends for the past 12 months. Google Trends uses Twitter as a resource, and this may help you evaluate the right keyword or phrase to run for your analysis.

import tweetera as t
t.check_trend("hello") #single word analysis

t.check_trend("hello", "goodbye", "nice to meet you") #or put in multiple words

To get a full analysis of your tweets use the generate_report function. A html report called "Sentiment_Report.html" will be generated in your current working directory.

import tweetera as t

t.load_env_credentials()
t.generate_report()

All outputs from the modeling and visualization process can be found in your current working directory under "output_data".

FAQs

  1. How long does the package take to run?

    Package run times are usually between 15-45 minutes, depending on your internet connection speed. Run times are subject to the complexity of tweets, where higher volume and more complex speech will increase runtime.

    If you are using a local MySQL database, runtime will be lower (generally).

  2. I use "X" database (not MySQL), will this package work?

    No.

  3. Why did you choose MySQL? "X" is so much better...

    Because we wrote this package and you didn't. Jokes aside, MySQL was chosen for simplicity and ease of setup. Future iterations of this package may include more connectors, but for now MySQL was the simplest choice in our opinion to get you moving quickly. MySQL had no obvious benefits over Maria DB, Postgres, or any other open source database software.

  4. Will TweetERA work if I don't have research level access to Twitter API v2?

    No.

  5. Should I use the results of TweetERA to make executive decisions?

    No! Tweet emotion analysis is a fickle thing, and far from perfect. TweetERA analyses should not be used to inform policy, public safety, or other important decisions.

Errors

Common errors that you may encounter:

  1. Common errors can be resolved if credentials.load_env_credentials() has not been called in your current session. Many IDEs and python environment require you to import credentials for each session. If you are experiencing SQL errors, API requests being blocked, or general issues, make sure to run the following before any other code:
import tweetera as t

t.load_env_credentials()
  1. Users may experience issues with the NLTK library. Corpora needed to process text may not be properly installed.

    See https://www.nltk.org/data.html for more information to find instruction on how to download the correct corpus. The traceback message will contain the missing file, such as: "Resource omw-1.4 not found."

  2. Freeze Support

    freeze_support()
    
    An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
    The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
    

    If you experience this error, nest your code under

    if __name__ == "__main__":

  3. "output_data" directory not found

    The load_database() function will create a default directory for package results for the users. In this current release, the directory can fail to load. When initializing the loading process users should check that this directory appeared in their current working directory. If not, cancel the run and refresh.

About

License:MIT License


Languages

Language:Python 100.0%