- 05/05/05
Data Enrichment Mock Exam
API results:
https://drive.google.com/file/d/10iWPhZtId0R9RCiVculSozCwldG-V3eH/view?usp=sharing
- Read in the json file
- Separate the records into 4 tables each a pandas dataframe
- Transform In this case remove dollar signs from funded amount in the financials records and convert to numeric datatype
- Create a database with SQLAlchemy and add the tables to the datbase
- Perform a hypothesis test to determine if there is a signficant difference between the funded amount when it is all males and when there is at least one female in the group.
- If there is time, perform an additional hypothesis test to determine if there is a significant difference in the funded amount for different sectors.
import json
import pandas as pd
import numpy as np
import seaborn as sns
from scipy import stats
import pymysql
pymysql.install_as_MySQLdb()
from sqlalchemy import create_engine
from sqlalchemy_utils import create_database, database_exists
## Loading json file
with open('Mock_Crowdsourcing_API_Results.json') as f:
results = json.load(f)
results.keys()
## explore each key
type(results['meta'])
## display meta
results['meta']
## display data
type(results['data'])
## preview the dictionary
# results['data']
## preview just the keys
results['data'].keys()
## what does the crowd key look like?
# results['data']['crowd']
## checking single entry of crowd
results['data']['crowd'][0]
## making crowd a dataframe
crowd = pd.DataFrame(results['data']['crowd'])
crowd
## making demographics a dataframe
demo = pd.DataFrame(results['data']['demographics'])
demo
## making financials a dataframe
financials = pd.DataFrame(results['data']['financials'])
financials
## making use a dataframe
use = pd.DataFrame(results['data']['use'])
use
## fixing funded amount column
financials['funded_amount'] = financials['funded_amount'].str.replace('$','')
financials['funded_amount'] = pd.to_numeric(financials['funded_amount'])
financials
## loading mysql credentials
with open('/Users/codingdojo/.secret/mysql.json') as f:
login = json.load(f)
login.keys()
## creating connection to database with sqlalchemy
connection_str = f"mysql+pymysql://{login['user']}:{login['password']}@localhost/mock-belt-exam"
engine = create_engine(connection_str)
## Check if database exists, if not, create it
if database_exists(connection_str) == False:
create_database(connection_str)
else:
print('The database already exists.')
## saving dataframes to database
financials.to_sql('financials', engine, index=False, if_exists = 'replace')
use.to_sql('use', engine, index=False, if_exists = 'replace')
demo.to_sql('demographics', engine, index=False, if_exists = 'replace')
crowd.to_sql('crowd',engine, index=False, if_exists = 'replace')
## checking if tables created
q= '''SHOW TABLES;'''
pd.read_sql(q,engine)
Follow the Guide: Choosing the Right Hypothesis Test from the LP.
-
$H_0$ (Null Hypothesis): -
$H_A$ (Alternative Hypothesis):