ZL14E161110 / HW4---Pandas

Please click into the folder to find the PDF file and the coding file

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pandas: Heroes Of Pymoli - Analysis Report

  • Observed Trend 1: Mourning Blade is the most profitable item for file 2, Retribution Axe is the most profitable item for file 1
  • Observed Trend 2: most people only buy 1 item and male is the main buyer for both file 1 & 2
  • Observed Trend 3: age group(20-24) consumed the most items for file 1 & 2
import pandas as pd
# Read the JSON file

fpath = input("Please choice file (1) or file (2)?")

if (fpath == "1"):
    JS = pd.read_json("purchase_data.json")
elif (fpath == "2"):
    JS = pd.read_json("purchase_data2.json")
else:
    print("no such file")
JS.head()

ee =JS.to_excel("test"+fpath+".xlsx","Sheet1")
Please choice file (1) or file (2)?1

Player Count

# Total Number of Players
Player_Count = pd.DataFrame([{"Total Players":JS["SN"].nunique()}])
Player_Count
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Total Players
0 573

Purchasing Analysis (Total)

#Number of Unique Items
UItem = JS["Item ID"].nunique()
#Average Purchase Price
AvePur = JS["Price"].mean() 
#Total Number of Purchases
NPur = JS["Price"].count()
#Total Revenue
TRev = JS["Price"].sum()

PurAnsys = pd.DataFrame([{"Number of Unique Items":UItem, "Average Price":"$"+str(round(AvePur,2)), "Number of Purchases":NPur, "Total Revenue":"$"+str(round(TRev,2))}])

PurAnsys = PurAnsys[["Number of Unique Items","Average Price","Number of Purchases","Total Revenue"]]
PurAnsys
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Number of Unique Items Average Price Number of Purchases Total Revenue
0 183 $2.93 780 $2286.33

Gender Demographics

TotalC = JS["Gender"].count()
MaleC =JS["Gender"].value_counts()
MaleP =round(MaleC / TotalC*100,2)

# Creating a new DataFrame using both duration and count
GenderDemo = pd.DataFrame({"Percentage of players":MaleP,"Total Count":MaleC})
GenderDemo

#Alternative Method
    #TotalC = JS["Gender"].count()
    #MaleC =pd.DataFrame(JS["Gender"].value_counts())
    #MaleP =pd.DataFrame(round(MaleC / TotalC*100,2))
    #MaleC = MaleC.reset_index().rename(columns={'index':'Gender','Gender':'Total Count'})
    #MaleP = MaleP.reset_index().rename(columns={'index':'Gender','Gender':'Percentage of Players'})
    #GenderDemo = pd.merge(MaleP,MaleC,how='left',on='Gender')
    #GenderDemo
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Percentage of players Total Count
Male 81.15 633
Female 17.44 136
Other / Non-Disclosed 1.41 11

Purchasing Analysis (Gender)

The below each broken by gender
#different method of normalization
#JSMax = JS['Price'].max()
#JSMin = JS['Price'].min()
#JSMean = JS['Price'].mean()
#JSStd = JS['Price'].std()
#JSN = (JS['Price'] - JSMin)/(JSMax-JSMin)
#JS['JSNor']=JSN
#JS.head()
JSGroup = JS.groupby(['Gender'])
#Purchase Count
PCbyG = JSGroup['Price'].count()
#Average Purchase Price
APbyG = round(JSGroup['Price'].mean(),2)
#Total Purchase Value
TPbyG = JSGroup['Price'].sum()
#Normalized Totals
# diff Method TPbyGN = JSGroup['JSNor'].sum()
TPbyGN = round(TPbyG / PCbyG, 2)
          
PAbyG = pd.DataFrame({"Purchase Count":PCbyG,"Average Purchase Price":APbyG,"Total Purchase Value":TPbyG,"Normalized Totals":TPbyGN})
PAbyG = PAbyG[["Purchase Count","Average Purchase Price","Total Purchase Value","Normalized Totals"]]
PAbyG
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Purchase Count Average Purchase Price Total Purchase Value Normalized Totals
Gender
Female 136 2.82 382.91 2.82
Male 633 2.95 1867.68 2.95
Other / Non-Disclosed 11 3.25 35.74 3.25

Age Demographics

The below each broken into bins of 4 years (i.e. <10, 10-14, 15-19, etc.)
# Create the bins in which Data will be held
JS["Age"]=pd.to_numeric(JS['Age'])
bins = [0,9,14,19,24,29,34,39,40]
group_names = ['<10','10-14','15-19','20-24','25-29','30-34','35-39','40+']
JS["Age Group"] = pd.cut(JS["Age"],bins,labels=group_names)
JSgrAge = JS.groupby(['Age Group'])

TCbyA = JS["Age Group"].count()
VCbyA =JS["Age Group"].value_counts()
PCbyA =round(VCbyA / TCbyA*100,2)

# Creating a new DataFrame using both duration and count
AgeDemo = pd.DataFrame({"Percentage of players":PCbyA,"Total Count":VCbyA})
AgeDemo = AgeDemo.sort_index()
AgeDemo
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Percentage of players Total Count
<10 3.60 28
10-14 4.50 35
15-19 17.12 133
20-24 43.24 336
25-29 16.09 125
30-34 8.24 64
35-39 5.41 42
40+ 1.80 14
JSgrAge = JS.groupby(['Age Group'])

#Purchase Count
PCbyA =JSgrAge['Price'].count()
#Average Purchase Price
APbyA =round(JSgrAge['Price'].mean(),2)
#Total Purchase Value
TPbyA =JSgrAge['Price'].sum()
#Normalized Totals
#MaxbyA = JSgrAge['Price'].max() (diff way of Calc)
#MinbyA = JSgrAge['Price'].min() (diff Way of Calc)
#TPbyAN = round((TPbyA-APbyA) / (MaxbyA-MinbyA),2)
TPbyAN = round(TPbyA / PCbyA, 2)

ADbyG = pd.DataFrame({"Purchase Count":PCbyA,"Average Purchase Price":APbyA,"Total Purchase Value":TPbyA,"Normalized Totals":TPbyAN})
ADbyG = ADbyG[["Purchase Count","Average Purchase Price","Total Purchase Value","Normalized Totals"]]
ADbyG
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Purchase Count Average Purchase Price Total Purchase Value Normalized Totals
Age Group
<10 28 2.98 83.46 2.98
10-14 35 2.77 96.95 2.77
15-19 133 2.91 386.42 2.91
20-24 336 2.91 978.77 2.91
25-29 125 2.96 370.33 2.96
30-34 64 3.08 197.25 3.08
35-39 42 2.84 119.40 2.84
40+ 14 3.22 45.11 3.22

Top Spenders

Identify the the top 5 spenders in the game by total purchase value, then list (in a table):
#SN
GroupbySN = JS.groupby(['SN'])

#Total Purchase Value by SN
TPbyS =GroupbySN['Price'].sum()
#TPbyS = TPbyA.sort_values('Price', ascending=False)
#Purchase Count
PCbyS =GroupbySN['Price'].count()
#Average Purchase Price
APbyS =GroupbySN['Price'].mean()

TopS = pd.DataFrame({"Purchase Count":PCbyS,"Average Purchase Price":round(APbyS,2),"Total Purchase Value":TPbyS})
TopS = TopS[["Purchase Count","Average Purchase Price","Total Purchase Value"]]
TopS = TopS.sort_values('Total Purchase Value',ascending=False)
TopS.head(5)
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Purchase Count Average Purchase Price Total Purchase Value
SN
Undirrala66 5 3.41 17.06
Saedue76 4 3.39 13.56
Mindimnya67 4 3.18 12.74
Haellysu29 3 4.24 12.73
Eoda93 3 3.86 11.58

Most Popular Items

Identify the 5 most popular items by purchase count, then list (in a table):
#Item ID
GroupbyID = JS.groupby(['Item ID','Item Name'])

#Total Purchase Value by ITem
TPbyI =GroupbyID['Price'].sum()
#TPbyS = TPbyA.sort_values('Price', ascending=False)
#Purchase Count
PCbyI =GroupbyID['Price'].count()
#Average Purchase Price
APbyI =GroupbyID['Price'].mean()
#Item Price
IPbyI =GroupbyID['Price']

TopI = pd.DataFrame({"Purchase Count":PCbyI,"Item Price":APbyI,"Total Purchase Value":TPbyI})
TopI = TopI[["Purchase Count","Item Price","Total Purchase Value"]]
TopI = TopI.sort_values('Purchase Count',ascending=False)
TopI.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Purchase Count Item Price Total Purchase Value
Item ID Item Name
39 Betrayal, Whisper of Grieving Widows 11 2.35 25.85
84 Arcane Gem 11 2.23 24.53
31 Trickster 9 2.07 18.63
175 Woeful Adamantite Claymore 9 1.24 11.16
13 Serenity 9 1.49 13.41

Most Profitable Items

Identify the 5 most profitable items by total purchase value, then list (in a table):
#Item ID
GroupbyID = JS.groupby(['Item ID','Item Name'])

#Total Purchase Value by ITem
TPbyI =GroupbyID['Price'].sum()
#TPbyS = TPbyA.sort_values('Price', ascending=False)
#Purchase Count
PCbyI =GroupbyID['Price'].count()
#Average Purchase Price
APbyI =GroupbyID['Price'].mean()
#Item Price
IPbyI =GroupbyID['Price']

TopI = pd.DataFrame({"Purchase Count":PCbyI,"Item Price":APbyI,"Total Purchase Value":TPbyI})
TopI = TopI[["Purchase Count","Item Price","Total Purchase Value"]]
TopI = TopI.sort_values('Total Purchase Value',ascending=False)
TopI.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
Purchase Count Item Price Total Purchase Value
Item ID Item Name
34 Retribution Axe 9 4.14 37.26
115 Spectral Diamond Doomblade 7 4.25 29.75
32 Orenmir 6 4.95 29.70
103 Singed Scalpel 6 4.87 29.22
107 Splitter, Foe Of Subtlety 8 3.61 28.88

About

Please click into the folder to find the PDF file and the coding file


Languages

Language:Jupyter Notebook 100.0%