bgizaa/using-pandas-library-for-visualization

The dataset is one of the historical sales of supermarket company which has recorded in 3 different branches for 3 months data

Before starting, download Anaconda Navigator & Jupyter Notebook Editor & the Supermarket dataset here

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
from pandas import DataFrame
import seaborn as sns

df = pd.read_csv('supermarket.csv') # read the file to visualize.
df.head(5)

	Invoice ID	Branch	City	Customer type	Gender	Product line	Unit price	Quantity	Tax 5%	Total	Date	Time	Payment	cogs	gross margin percentage	gross income	Rating
0	750-67-8428	A	Yangon	Member	Female	Health and beauty	74.69	7	26.1415	548.9715	1/5/2019	13:08	Ewallet	522.83	4.761905	26.1415	9.1
1	226-31-3081	C	Naypyitaw	Normal	Female	Electronic accessories	15.28	5	3.8200	80.2200	3/8/2019	10:29	Cash	76.40	4.761905	3.8200	9.6
2	631-41-3108	A	Yangon	Normal	Male	Home and lifestyle	46.33	7	16.2155	340.5255	3/3/2019	13:23	Credit card	324.31	4.761905	16.2155	7.4
3	123-19-1176	A	Yangon	Member	Male	Health and beauty	58.22	8	23.2880	489.0480	1/27/2019	20:33	Ewallet	465.76	4.761905	23.2880	8.4
4	373-73-7910	A	Yangon	Normal	Male	Sports and travel	86.31	7	30.2085	634.3785	2/8/2019	10:37	Ewallet	604.17	4.761905	30.2085	5.3

df.Branch.unique()

array(['A', 'C', 'B'], dtype=object)

To specificially observe the data of Females in Mandalay who have shopped a Quantity of more than 1

new_df = df.loc[(df['City'] == 'Mandalay') & (df['Gender'] == 'Female') & (df['Quantity'] > 1)]
new_df.head(5)

	Invoice ID	Branch	City	Customer type	Gender	Product line	Unit price	Quantity	Tax 5%	Total	Date	Time	Payment	cogs	gross margin percentage	gross income	Rating
9	692-92-5582	B	Mandalay	Member	Female	Food and beverages	54.84	3	8.226	172.746	2/20/2019	13:27	Credit card	164.52	4.761905	8.226	5.9
10	351-62-0822	B	Mandalay	Member	Female	Fashion accessories	14.48	4	2.896	60.816	2/6/2019	18:07	Ewallet	57.92	4.761905	2.896	4.5
15	299-46-1805	B	Mandalay	Member	Female	Sports and travel	93.72	6	28.116	590.436	1/15/2019	16:19	Cash	562.32	4.761905	28.116	4.5
19	319-50-3348	B	Mandalay	Normal	Female	Home and lifestyle	40.30	2	4.030	84.630	3/11/2019	15:30	Ewallet	80.60	4.761905	4.030	4.4
28	145-94-9061	B	Mandalay	Normal	Female	Food and beverages	88.36	5	22.090	463.890	1/25/2019	19:48	Cash	441.80	4.761905	22.090	9.6

To concentrate only specific columns and plot them

In this case we will observe Branch frequency of specific branches

import seaborn as sns
sns.countplot(x="Branch", data = df).set_title("Branch Frequency")

Text(0.5, 1.0, 'Branch Frequency')

In this section we will create a new column for frequency of each product

df['Frequency of Product'] = df['Product line'].map(df['Product line'].value_counts())
df.head(4)

</style>

	Invoice ID	Branch	City	Customer type	Gender	Product line	Unit price	Quantity	Tax 5%	Total	Date	Time	Payment	cogs	gross margin percentage	gross income	Rating	Frequency of Product
0	750-67-8428	A	Yangon	Member	Female	Health and beauty	74.69	7	26.1415	548.9715	1/5/2019	13:08	Ewallet	522.83	4.761905	26.1415	9.1	152
1	226-31-3081	C	Naypyitaw	Normal	Female	Electronic accessories	15.28	5	3.8200	80.2200	3/8/2019	10:29	Cash	76.40	4.761905	3.8200	9.6	170
2	631-41-3108	A	Yangon	Normal	Male	Home and lifestyle	46.33	7	16.2155	340.5255	3/3/2019	13:23	Credit card	324.31	4.761905	16.2155	7.4	160
3	123-19-1176	A	Yangon	Member	Male	Health and beauty	58.22	8	23.2880	489.0480	1/27/2019	20:33	Ewallet	465.76	4.761905	23.2880	8.4	152

In this section we will create a dataframe derived from the major dataframe 'df' to focus on frequency of product and the product

forbranchfreq = df[['Product line','Frequency of Product']]
forbranchfreq.head(5)

	Product line	Frequency of Product
0	Health and beauty	152
1	Electronic accessories	170
2	Home and lifestyle	160
3	Health and beauty	152
4	Sports and travel	166

In this section we will group the frequencies in order to derive unique values of the product that we can plot sefully later and also sort it from first to last

forbranchfreq100 = forbranchfreq.groupby(['Product line']).sum()
forbranchfreq120 = forbranchfreq100.sort_values(by=['Frequency of Product'],  ascending=False)
forbranchfreq120 = DataFrame.drop_duplicates(forbranchfreq120)
forbranchfreq120.head(3)

	Frequency of Product
Product line
Fashion accessories	31684
Food and beverages	30276
Electronic accessories	28900

In this section we wil plot the new grouped and sorted dataframe of product and the frequency. We derive from this that Fashions and accessories come first and Health and beauty products come last

this can inform supermarket management to put rackets of fashions and accesories at the door of the supermarket because they are a best-seller

forbranchfreq120.plot(kind="barh", color = 'blue',figsize=(15,10))
plt.xticks(rotation=45);
plt.savefig("figure1.png")

Plot to count frequency of Payment Channel using seaborn plotting library

sns.countplot(x="Payment", data = df).set_title("Payment Channel Frequency")

Text(0.5, 1.0, 'Payment Channel Frequency')

In this section we will determine the most active times in which customers shop at the supermarket.

We will derive only the hour from the time column by stripping the column

df['STime'] = df['Time'].str[:2]
df.head(5)

	Invoice ID	Branch	City	Customer type	Gender	Product line	Unit price	Quantity	Tax 5%	Total	Date	Time	Payment	cogs	gross margin percentage	gross income	Rating	Frequency of Product	STime
0	750-67-8428	A	Yangon	Member	Female	Health and beauty	74.69	7	26.1415	548.9715	1/5/2019	13:08	Ewallet	522.83	4.761905	26.1415	9.1	152	13
1	226-31-3081	C	Naypyitaw	Normal	Female	Electronic accessories	15.28	5	3.8200	80.2200	3/8/2019	10:29	Cash	76.40	4.761905	3.8200	9.6	170	10
2	631-41-3108	A	Yangon	Normal	Male	Home and lifestyle	46.33	7	16.2155	340.5255	3/3/2019	13:23	Credit card	324.31	4.761905	16.2155	7.4	160	13
3	123-19-1176	A	Yangon	Member	Male	Health and beauty	58.22	8	23.2880	489.0480	1/27/2019	20:33	Ewallet	465.76	4.761905	23.2880	8.4	152	20
4	373-73-7910	A	Yangon	Normal	Male	Sports and travel	86.31	7	30.2085	634.3785	2/8/2019	10:37	Ewallet	604.17	4.761905	30.2085	5.3	166	10

fortime = df[['Total', 'STime']]
fortime = fortime.groupby(['STime']).sum()
fortime.head(5)

	Total
STime
10	31421.4810
11	30377.3295
12	26065.8825
13	34723.2270
14	30828.3990

From this bar chart we can observe that people shop at 7pm the most.

fortime.plot(kind="bar", color = 'red',figsize=(15,10))
plt.xticks(rotation=45);
plt.xlabel("Time")
plt.ylabel("Amount")
plt.savefig("figure2.png")

Pivot table for date, customer type, amount, Total, mean

forcustomertype = df[['Customer type','STime','Total']]
forcustomertype = DataFrame.drop_duplicates(forcustomertype)
forcustomertype

	Customer type	STime	Total
0	Member	13	548.9715
1	Normal	10	80.2200
2	Normal	13	340.5255
3	Member	20	489.0480
4	Normal	10	634.3785
...	...	...	...
995	Normal	13	42.3675
996	Normal	17	1022.4900
997	Member	13	33.4320
998	Normal	15	69.1110
999	Member	13	649.2990

999 rows × 3 columns

pivottable = pd.pivot_table(forcustomertype,index=["Customer type"],columns = ["STime"],values=["Total"], aggfunc=np.sum, margins=True, margins_name='Amount', fill_value=0) 
pivottable = pivottable.style.format("{:,.0f}") 
pivottable

	Total
STime	10	11	12	13	14	15	16	17	18	19	20	Amount
Customer type
Member	12,267	15,228	13,730	16,007	19,048	18,750	10,601	12,775	11,659	21,058	12,913	164,034
Normal	19,154	15,150	12,336	18,716	11,781	12,240	14,625	11,670	14,371	18,642	10,057	158,743
Amount	31,421	30,377	26,066	34,723	30,828	30,990	25,226	24,445	26,030	39,700	22,970	322,778

Exporting the pivot table to excel

pivottable.to_excel("pivottableforcustomertype.xlsx")

About

Usin pandas for data visualization