Goal: Data Analysis on store sales and profits dataset using Python and data visualisation using Tableau.
Analyzing store profit and sales is crucial for a business's success because it provides actionable insights into its financial health and performance. This analysis helps in identifying trends, assessing the impact of various factors on profitability, and making informed decisions to optimize sales strategies and operations. It enables businesses to allocate resources effectively, respond to market changes, and ultimately maximize their profitability and sustainability.
Below is the raw dataset file:
Check the chosen dataset if it needs any data cleaning. Go through all the fields, check for any null values or incorrect data types
Import Pandas library and load the dataset into Google Collaboratory
https://colab.research.google.com/drive/1SLPRove3m7KUGXSEDrB1DWQCFbnysIyd?usp=sharing
import pandas as pd
df=pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Expert+-+Superstore+-+Master.xlsx - Master.csv")
df.head()
![image](https://private-user-images.githubusercontent.com/146320825/274589838-0b0b8926-dea4-4a78-9493-d2ac5c535319.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTg5ODM4LTBiMGI4OTI2LWRlYTQtNGE3OC05NDkzLWQyYWM1YzUzNTMxOS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0yMDg4ZGNjYTJmN2VhMTNmZDZlNjZhYmQwODA0YWEyNTFiNTM1YTgyZWIwNjVmYzE0OGZhYjdjNTQxMjk3OGRiJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.RfUt9dWKcO7QIxlg5v-Wem4SY8QuU-fn3RtGoc0DV-U)
First we have to find the duplicate:
df.duplicated().sum()
Using these 2 functions we can find the total number of duplicates. 2
View the duplicated rows:
df[df.duplicated(keep=False)]
![image](https://private-user-images.githubusercontent.com/146320825/274591565-749fa115-6512-4e6d-b702-7f7786607034.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTkxNTY1LTc0OWZhMTE1LTY1MTItNGU2ZC1iNzAyLTdmNzc4NjYwNzAzNC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iMGI2YWU3NDdhNWVlZTg5NjEwYTk3MmU5MGJiNmMxZDBhYWExM2Y1OTYxZjBiYmZkNzA1NDcxNjMyNTRmNjAwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.hMWb_3cob_S9Nf_-BP7jrNQAnVUObDaZk5mQlMKkHVc)
df.isna().sum()
![image](https://private-user-images.githubusercontent.com/146320825/274591803-733a62c2-e07d-4f55-abc1-7f620f207da6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTkxODAzLTczM2E2MmMyLWUwN2QtNGY1NS1hYmMxLTdmNjIwZjIwN2RhNi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03MDc0ZDA4ZTFmZDNlZmEzYjAwM2E2MjVlNmY0ZDNjZDk5MjliNDcxMmFkMWMyMzUxNWQ1NGQ2YTEwZjAwZGMwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.s5-IKoHzpPMaOpvlU40E4mhyvJqYcDh16vjB9bvVy_Y)
No missing value found.
Check data type
df.dtypes
![image](https://private-user-images.githubusercontent.com/146320825/274592639-1a515048-a9f6-4480-a660-79fce72c387c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTkyNjM5LTFhNTE1MDQ4LWE5ZjYtNDQ4MC1hNjYwLTc5ZmNlNzJjMzg3Yy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jNjI3ZDc5NzQ2N2FiNWZlNmFjMWMzYWZlZGFkMTMwN2Y3NDE0MjZkOGRmMTE1NjU4ZWNiODgxMzA3YzMxYTM3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.0ZcisrV3K5TI6F_3-6V1XN-7MbAzA8ZP0TphPqzVDQs)
Due to too many columns, .info function could not show all the columns. The below function is made to display all the columns:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
print(df.head())
![image](https://private-user-images.githubusercontent.com/146320825/274592892-0822b7ce-4b5b-4f86-9620-c1f5b7a5191b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTkyODkyLTA4MjJiN2NlLTRiNWItNGY4Ni05NjIwLWMxZjViN2E1MTkxYi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xNDA5YTZhYmRmMzRlMGZmYzgyZTQ5N2Y4ZGUxZWM5NWM0YjhkNWZjNzc0ZWUyYjE3YzY2MTMyYTNmMWU1YTJhJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.RbNM14VuniB3KbuMMTbWUoSDuhI07Tu05_ERdw4sbPA)
![image](https://private-user-images.githubusercontent.com/146320825/274592927-a6731a14-30b8-428e-958e-0874dda800cb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTkyOTI3LWE2NzMxYTE0LTMwYjgtNDI4ZS05NThlLTA4NzRkZGE4MDBjYi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1kYjIxYzA2MTM5YTExOWM1MTIxMTQxMjFmYjBiMjE2NzVhMjg5ZTU5ZTIzOGJiM2ZjOGFjYTM3Y2JiZTkzZWJkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.yrAWwr4WGVQCBVO4l1U_JMkgGYsqw-KKYie33qobaYU)
![image](https://private-user-images.githubusercontent.com/146320825/274592939-f8ee320d-9c52-4965-815e-1615725481bb.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTkyOTM5LWY4ZWUzMjBkLTljNTItNDk2NS04MTVlLTE2MTU3MjU0ODFiYi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lYmM0YmRjMjg4YTMzMTkzOGQzNWIyMTM4NDk4NTdmMTBhNzRjNzAwNjgxNmFmM2UzZTdjODA3ZWZhMmM3OGRhJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.hobJH_mB73Z7DY1EIsMXaps7Wdt9WTUC8jG_UMIoE9M)
Datatypes error:
- Profit per Customer, should be in float type
- Profit per Order, should be in float type
- Sales Forecast, should be in float type
- Sales per Customer, should be in float type.
- Order Date, should be in date format
- Profit, should be in float type
- Sales, should be in int type
- Ship Date, should be in date type
After investigation, it seems that the data type can’t convert to float is because there is a comma
![image](https://private-user-images.githubusercontent.com/146320825/274593644-9eca00ff-a36b-4aea-afed-08288fbf9565.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTkzNjQ0LTllY2EwMGZmLWEzNmItNGFlYS1hZmVkLTA4Mjg4ZmJmOTU2NS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hNWMyYzYwNmI2YzNmZjhhNGFhM2YxMzYwNTViMjUzYTFmYzQ1NTc3YjAxNTUyMzFhYWZlNTIzZTQzOTRjNTFjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.Cegh_-HR-or2p6agmwt3pnuUctD5cLPPBSISBn2aBxw)
We need to remove the comma first
df['Profit per Customer']=df['Profit per Customer'].str.replace(',','')
Then we change then to float type:
df[‘Profit per Customer’] = df[‘Profit per Customer’].astype(float)
Repeat all for float type error.
Change to date function
df['Order Date'] = pd.to_datetime(df['Order Date'])
Repeat for Ship Date
Change string to int for sales column
df['Sales']=df['Sales'].str.replace(',','')
df['Sales'] = df['Sales'].astype(int)
![image](https://private-user-images.githubusercontent.com/146320825/274594129-f518d50a-526e-444b-9910-f2811a3d24c1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk0MTI5LWY1MThkNTBhLTUyNmUtNDQ0Yi05OTEwLWYyODExYTNkMjRjMS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02M2E2ZmU4MjNkMTJhYmIwMzE4YzQxOWM0ZWNlN2M0YzhlNjU0N2MyNGIwMzE3YzBjMGNmZjc2OWM2MzgwNDgxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.D67tLwoEQ1mKu1HQMAKC-1GoRcZSAIhnP11LDR7doFo)
Now all the datatypes are correct
Number of record has no meaning as it is always one, checked by unique and nunique function:
df["Number of Records"].nunique()
ans: 1
It has no comparison value, hence it is dropped
df.drop('Number of Records', axis=1,inplace = True)
Since there is always only 1 customer, sales per customer, profit per order and sales per profit can be drop as it is repeat in profit and sales column.
df.drop(['Sales per Customer','Profit per Customer','Profit per Order'], axis=1,inplace = True)
Sales forecast and unit estimated can also be drop as they show no significant analysis
df.drop(['Sales Forecast','Unit Estimated'], axis=1,inplace = True)
Check inconsistent data entry using unique function
df["City"].unique()
list(df['City'].unique())
After checking the columns one by one, there is no inconsistent data entry
No columns have any extra whitespaces errors
There are no spelling errors
There are no numerical errors
There are 10000 rows, 22 columns. There are Column types of both categorical and numerical and they provide us the information about the Store details. Day to ship actual vs schedule, shipping status and shipping mode for the product. Segment, category and sub category of the product. Product name, customer name, manufacturer for identification. City, Country, Region and Sate for the location information. Order Id and order date to keep track of the order. Profit, Profit Ratio, Sales and Quantity for analysis.
Key performance indicators: Sales, profit, profit ratio, of the products can be used to analyse the performance of the store. The analysis can even be segregated based on product category, location, shipping mode and so on. We can even investigate base on manufacturer as well.
New information, indicators can be drawn through this dataset is Cost which can be generated from the profit ratio and sales.
df['Cost']=df['Sales']*(1-(df['Profit Ratio'].str.rstrip('%').astype('float') / 100.0))
df.describe()
![image](https://private-user-images.githubusercontent.com/146320825/274596818-3d83f3da-6323-4cfb-b934-4f27dcbec53c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk2ODE4LTNkODNmM2RhLTYzMjMtNGNmYi1iOTM0LTRmMjdkY2JlYzUzYy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02ODZjMTgzNzRhNTE5NzUzYWVkOTc4ZGU4NDFhNzEyMDVmMzA3MzZkMTRlNDM2ZTExZWRhYzRkOTA4N2ExZGNlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.3phVFlhpqB-TYr8ibhBA9ErQYG93itJbf4rF8XRo49I)
In order to have a better data description we usually check the Shape and Size of our dataset along with the general description of datasets such as count, unique values etc.
df.shape
10000 rows and 24 columns after adding 2 new indicators
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
print(df.head(16))
![image](https://private-user-images.githubusercontent.com/146320825/274597109-6f5b2ed6-8a44-4674-b2dd-6b80e999abed.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk3MTA5LTZmNWIyZWQ2LThhNDQtNDY3NC1iMmRkLTZiODBlOTk5YWJlZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1kMDllMzQ0YzRiZGRlYzRkYTU5ZmU0ZGRmNWU0N2QzMDg2M2IxMzg3MDMyMWFmMDg4ODJhMDBhZmU4OWNlOTNkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.skYqel5-hEePfUIi-sCs-fm2IPNEtnRDFWdj__a9FBw)
![image](https://private-user-images.githubusercontent.com/146320825/274597139-8bcab946-bc20-4d22-b92f-9e0f0fb36466.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk3MTM5LThiY2FiOTQ2LWJjMjAtNGQyMi1iOTJmLTllMGYwZmIzNjQ2Ni5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02NGQ5NzU5YzJkMjcyMWU3YjFjMTczNjg0ZGQ1M2IyNTIwN2E3MGViNGMyMWRlMWM5N2FjNGRhMWZjODk2NWEwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.R_o2lNSgCQpTHS4A2_-N9brVjApiy0oZONiGmHJtJM8)
![image](https://private-user-images.githubusercontent.com/146320825/274597158-0d0cc665-ebf0-47e3-9212-a25e6c13e42c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk3MTU4LTBkMGNjNjY1LWViZjAtNDdlMy05MjEyLWEyNWU2YzEzZTQyYy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT05NmVkYjEzYzA0MDE3YmQ0NDNmMTU4ZTRmZTRjZGNlMTY3MmQ3ODMxNTBmNzFhZGE1ZjQxYTQzY2Q3NTA1YjlkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.mTcpZV8zstLheYyGmaJz8Vi_NltyoqNiYeE3sPHM8DM)
By investigating the shape and size of our data set:
- Aware of the size of your datasets
- Aware that your columns have numerical or categorical
- Major information/ description of the datasets, provided some insight of the data stored in the datasets. Example: a. Sub-Category b. Manufacturer c. Ship Mode d. Location e. Segment f. Ship Status
It helps us understand the data trends and values based on the compact display of values. Helps describe the data, and generate insight from the characteristic of the data. A store business owner might want to look into the sales and decide which products have better performance so that he or she can focus more than these products. A store owner can also look into which products give negative profit.
df.dtypes
![image](https://private-user-images.githubusercontent.com/146320825/274597548-6c00d2a6-c134-46db-a881-54de116b163f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk3NTQ4LTZjMDBkMmE2LWMxMzQtNDZkYi1hODgxLTU0ZGUxMTZiMTYzZi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1kMzMyMDMxNDllN2NiMDVjNzQxMWY5MGZjYjdlNjExZDVjMzY3YjQyZGU4N2U4ZTQwOWNkZjM1N2U2ZDc5Yjc2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.zAx4P55EcbVqxHDgXFKmARB4sb9_y6bHreoWt3bePRU)
df['Ship Status'].unique()
array(['Shipped Early', 'Shipped Late', 'Shipped On Time'], dtype=object)
df['Category'].unique()
array(['Office Supplies', 'Technology', 'Furniture'], dtype=object)
df['Country'].unique()
array(['United Kingdom', 'France', 'Germany', 'Italy', 'Spain', 'Netherlands', 'Sweden', 'Belgium', 'Austria', 'Ireland', 'Portugal', 'Finland', 'Denmark', 'Norway', 'Switzerland'], dtype=object)
df['Discount'].unique()
array(['0%', '10%', '15%', '40%', '50%', '60%', '35%', '20%', '30%', '45%', '70%', '65%', '80%', '85%'], dtype=object)
df['Region'].unique()
array(['North', 'Central', 'South'], dtype=object)
df['Segment'].unique()
array(['Corporate', 'Consumer', 'Home Office'], dtype=object)
df['Ship Mode'].unique()
array(['Standard Class', 'Second Class', 'Same Day', 'First Class'], dtype=object)
df['Sub-Category'].unique()
array(['Storage', 'Accessories', 'Labels', 'Phones', 'Copiers', 'Appliances', 'Fasteners', 'Art', 'Envelopes', 'Binders', 'Bookcases', 'Machines', 'Paper', 'Supplies', 'Tables', 'Chairs', 'Furnishings'], dtype=object)
Some categorical columns have too many unique values to be displayed.
df.describe()
![image](https://private-user-images.githubusercontent.com/146320825/274598332-50c9b5af-176e-4dfe-b62d-283e6bc62b23.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk4MzMyLTUwYzliNWFmLTE3NmUtNGRmZS1iNjJkLTI4M2U2YmM2MmIyMy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03ZDRlMDNmNzM0MjNiYTUzMDgxZWRiM2Q5ZGEyZTQwNzcxZjZlNDg3ZjNkMjBiMDNiN2NhNmNiMDhjYmE3Mzk5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.Ofdv1k9sd-zrw-51LUFoQudSamO4XjNsVVowCkudq-M)
![image](https://private-user-images.githubusercontent.com/146320825/274598410-acdb329c-ab9e-4c9a-ae4f-a677f2e2768c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk4NDEwLWFjZGIzMjljLWFiOWUtNGM5YS1hZTRmLWE2NzdmMmUyNzY4Yy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zMWYzZjQyYWE3YmU0MzJlYzdmMzI4YjExZGU5MzQwNzVjMzVmOTcyMWI0NGEyMDI0ZjUwMjA4YTBkZjU5M2I5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.qm3tT5f2x-8HTic2JjdUkJlFjZFp4ALwmtOcoJ4dfS8)
Summarized the large datasets into insightful numbers and gist of information about the data. Business owner can understand the general situation, make decisions and monitor the changes. Summaries of data help us understand the detailed trends followed in datasets based on concise information using measures of location and spread
There are 3 types of summary statistics:
Mean (Average of a data set), Median (middle value of the data set), Mode (most repeated number),
![image](https://private-user-images.githubusercontent.com/146320825/274598823-2cf0ff9c-479f-4cab-a089-77b470c09c67.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk4ODIzLTJjZjBmZjljLTQ3OWYtNGNhYi1hMDg5LTc3YjQ3MGMwOWM2Ny5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0wNDlkZGU2OWFlMjUzZDEyMGE4YWM0ZDdmOTQ2OWMzNWEwMzg4M2Q2MmI2YWQ2ZjE2NDI2MDczYWNmZmY5NzA4JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.R1aliwwwDDDMAi_l56jb05F2nONRa9J1I822ilhuovA)
df.mode()
![image](https://private-user-images.githubusercontent.com/146320825/274598982-6d14eda6-6135-4988-965a-1d29c0df8d7b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk4OTgyLTZkMTRlZGE2LTYxMzUtNDk4OC05NjVhLTFkMjljMGRmOGQ3Yi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00ZmZmYTJlYTNmOTMxNTQyYjIzMGEyYjQ2NTcwZjNhZWZlMDc1NGU5YmJjMjkzNmMxNDYxNTU0ZDhlZWVlMWIxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.jBEys8yLG2jrGlBVsqfSRJf93NdRVySW-e5yvD3J_ls)
![image](https://private-user-images.githubusercontent.com/146320825/274599019-21640563-c5a8-44ff-a5a4-40882fa5696a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk5MDE5LTIxNjQwNTYzLWM1YTgtNDRmZi1hNWE0LTQwODgyZmE1Njk2YS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02NDZkYThiOTM2YzhmMzYxMTA1NTBjM2I0NDQ3ZDMxYjEyZTE2NzBjNTNmNDMzMDI5ODExYTZmNWZkODA2Mjc3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.CiRd3sWwYSa7t16XVqTabmKWVBQn6RjurL90Z9qix_A)
To count the number of mode:
df[df["Product Name"] == 'Eldon File Cart, Single Width'].count()
count: 30
df[df["Profit"] == 0].count()
Count: 293
df.var()
![image](https://private-user-images.githubusercontent.com/146320825/274599543-2f4741fa-79bd-4aab-81c1-02d08330da6b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NTk5NTQzLTJmNDc0MWZhLTc5YmQtNGFhYi04MWMxLTAyZDA4MzMwZGE2Yi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lOTM1ZTNmNWJhNjRhYmIwOGJmZTZmOTVlZGE0YzRjMmU1Y2E3OGFjZmQ0YWFiOTI2OWExNWY1ZWUxNTY0YmZjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.htT0nRH1-sSTXaYPxPQKDqP_FMiQzEPTXp2iyEj3u9A)
Creating table with grouped information
df.groupby('Ship Status').mean()
![image](https://private-user-images.githubusercontent.com/146320825/274600206-43135f65-55b6-4d44-9952-ea8d027c61bd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NjAwMjA2LTQzMTM1ZjY1LTU1YjYtNGQ0NC05OTUyLWVhOGQwMjdjNjFiZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03ZTFmZWE4YjkwODI0ZDQ5ODc1YWIxNDNmY2MzNzhjMDdkNmE1N2M1OGQxN2RiMjZiNDFmNWM5NzA5YmZhOWU5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.B1y6AQAA5qEpN6bQzI-boma-LDTyacoK8ycKwW0Ouco)
Shipped early gained more avg profit ratio
df.groupby('Category').mean()
![image](https://private-user-images.githubusercontent.com/146320825/274600450-79ef6c68-477d-467a-a519-91845c5e25f5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NjAwNDUwLTc5ZWY2YzY4LTQ3N2QtNDY3YS1hNTE5LTkxODQ1YzVlMjVmNS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02MjdmNDllMWZhZDI2ZWYxNzNlYWQ1ZGM5ZjcxM2I5NmU2YzMyOWM0NTM2ZmE4MWVmZDdiODY2MmE3YWFjMTE1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.6wGSYho1hb5yzyjf849lqdTXTjzVkBxPejHk2e4vlzM)
Technology has the higher average but office supplies has the highest avg in terms of profit ratio.
df.groupby('Country').mean()
![image](https://private-user-images.githubusercontent.com/146320825/274600661-3fae76cb-d803-4efa-9f65-ff02f17aca9d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NjAwNjYxLTNmYWU3NmNiLWQ4MDMtNGVmYS05ZjY1LWZmMDJmMTdhY2E5ZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02ZGU5ZmQ4MTIxNTgyZDYyMmQwNTNhYjVhOTU3NGJmY2Q4NjI1NjRiNTE1ZGRiNDZhYjQxMTAyYjBjYzVjOWExJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.8AoNefXjJFZxcQPg8mKyvhzdN18lwzhXVWquvzyh4vc)
In terms of country, Switzerland has the highest avg profit and profit ratio
df.groupby('Discount').mean()
![image](https://private-user-images.githubusercontent.com/146320825/274600824-3af7cbdb-9e1d-4624-a1d1-b65cdcb800ea.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NjAwODI0LTNhZjdjYmRiLTllMWQtNDYyNC1hMWQxLWI2NWNkY2I4MDBlYS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xMzEzYmYxMWRhNDc1MWI5MGI5NDk5YTkxMjFmYmMzZWVjZmQxYjRhNzE3MmJjZjhjOTk0MWM3MTI4MTEyZGI0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.b7qgf2Bki-7K51I7Qjd666Ha-KXnXGSFn7Y4CtT5lIg)
The higher the discount rate the lower the earning, in some cases, loss incurred.
df.groupby('Segment').mean()
![image](https://private-user-images.githubusercontent.com/146320825/274601040-29d3f4a8-2954-4872-9a6d-af425ec3c8f0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NjAxMDQwLTI5ZDNmNGE4LTI5NTQtNDg3Mi05YTZkLWFmNDI1ZWMzYzhmMC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zZjdlMTRhODQ4ZjAxOGUxNWQ3ZDFkZGZkMWQxNzdmZDA4YWNjNTJlOTZiOTQ3MzFlMjc4YzYxYjFhOTU4ZDAwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.WRNdHGOmjRR4bb5cOc4dl3cMV9YcymIJj_wR472kebw)
Corporate has a slightly higher profit and sales, but home office profit ratio is higher.
To understand the spread and distribution of data. and to find outliers.
Here we are calculating the Quartiles by dividing the dataset into 4 groups
df.quantile([0.25,0.5, 0.75, 1], axis = 0)
![image](https://private-user-images.githubusercontent.com/146320825/274601490-c8429353-e6cf-4182-952a-20b880ca24df.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NjAxNDkwLWM4NDI5MzUzLWU2Y2YtNDE4Mi05NTJhLTIwYjg4MGNhMjRkZi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00ODEwYjExODRkYTM0MTdiYjBkNGQwZmZkMWU5YTU5ZTNjMDZjZTFiZGZlZDE3MDZhZGY5NzM3NmU3ZjZkMzEwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.qtFSIUjl26txoEcfmIp3scwtzNA94j09baB2g0DZHMs)
Making quartiles by profit
q1, q3 = df["Profit"].quantile([0.25,0.75])
iqr = q3- q1
iqr
47.25
lower_min = q1 - (1.5*iqr)
upper_max = q3 + (1.5*iqr)
print("Lower expected min of IQR = ", lower_min)
print("Upper expected max of IQR = ", upper_max)
Lower expected min of IQR = -69.875 Upper expected max of IQR = 119.125
df[(df["Profit"] < -69.875) | (df["Profit"] > 119.125 )].count()
There are 1718 outliers more than 119.124 .
Base on profit mean median mode and quartile, there is an average of 37.29 profit per order with a median of 14, negative skewness. In mode, out of 10000, 293 orders have zero profit, business owner need to investigate the high occurrence of zero profit. 1718 orders are outliers with profit higher than 119.125, it can be concluded that there are a few orders that performed highly.
From the few group by the information table, a few insights can be made. Shipping early gained more profit for the store. Technology products has higher profit average. Business should focus more on technology. In addition, selling to Switzerland gain more profit than other countries. Lastly, high discount causes business to lose money. Business owner should reconsider in these discount rate.
Interactive dash board.
A data story is created through Tableau using this dataset:
First page shows the overview of the datasets. Size, shape and average or the data for customer to understand the data range.
![image](https://private-user-images.githubusercontent.com/146320825/274603655-8c8c2bd4-42f9-42d9-8023-7433122438e8.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NjAzNjU1LThjOGMyYmQ0LTQyZjktNDJkOS04MDIzLTc0MzMxMjI0MzhlOC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iNjQ3Njc3MGVjZjEzNDA5NTdjYWIzMzg3MWY3ODI2NDU4NzMzNWFkMTI3MTlmNzVmZjllNjk1MTIxN2JmZGI5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.jt7jRpX-pmHHUO1c9_J-SRQfvdenp6p2Y7-emJzq9-g)
Second page shows the profit base on different products and profit generated from differen states.
![image](https://private-user-images.githubusercontent.com/146320825/274603687-451195f0-83e0-47a5-8674-8214bdca5f6a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NjAzNjg3LTQ1MTE5NWYwLTgzZTAtNDdhNS04Njc0LTgyMTRiZGNhNWY2YS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04YjIzZmE5NjU3ZjUxZWM5NjdhN2FkOGFkNmQwMTU1ZDc4MzM1MGNkOTI5MTg3NTE0Y2FiZDBkZWMxNmIxNDFkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.o3bwoUImZj6iezjzkNtl9S9-PO8WqXdc6vsDyujgC00)
Last page shows the profit performance base on different factors such as segment, product category and discount. A historgram is ploted to understand the distribution of profit.
![image](https://private-user-images.githubusercontent.com/146320825/274603720-d85a37b9-f77d-4614-bc6a-6fc33869d3d3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE4MzUwMDcsIm5iZiI6MTcyMTgzNDcwNywicGF0aCI6Ii8xNDYzMjA4MjUvMjc0NjAzNzIwLWQ4NWEzN2I5LWY3N2QtNDYxNC1iYzZhLTZmYzMzODY5ZDNkMy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzI0JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyNFQxNTI1MDdaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04YWYxZjM5OGU1ODRmNTQ5YzI5OGI2M2QzYjU3MTdlNTY0MDVhMDRkODY1ZDg4ZGM1MTE5NDM1NjE2MmI0NTJiJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.-9eM5pRr04Ob4tidgjieFOLAF-YIjnL7gcjret7HkCk)
From the data analysis, we found out that the as the year progesses, more profit is generated, the business increasing for this store. Furthermore, some countries generated negative profits, business owner need to take note and action has to be taken to limit the losses. One of the cases for loses might be due to discount rate given is too high for some cases. Business owner need to reconsider on these discount rate. Through data sorting, we can see that hoover stove generates most profit and England has the highest profit generated area. Store owner needs to put in more focus on these high profit generated area and product. Other than that, sorting through segment, category and discount can also give an overview for analysis and let us know which selection generated the most profit, for example: Consumer segment produces the highest profit. Using price historgram we can see that most of the order generated 0 profit. Store owner needs to elimited and reduce these occurance and investigate the reason behind these issue.
Using interactive visual dashboard, we can sort products, segments and category to compare performance of the business based on sales, profit, area, and shipment types. By combining data analysis and data visualisation, store owner can make their decision better to get the most out of the business.