nawaz-kmr / Supermarket-Sales-Analysis

Before venturing on to any data science project it is important to pre-process the data and also to explore the data. Today we will discuss a very basic topic of exploratory data analysis (EDA) using Python and also uncover how simple EDA can be extremely helpful in performing preliminary data analysis. The approach we will follow today is ask some questions and try to get those answers from the data. We will consider the supermarket sales data from the Kaggle dataset.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Supermarket-Sales-Analysis.

Before venturing on to any data science project it is important to pre-process the data and also to explore the data. Today we will discuss a very basic topic of exploratory data analysis (EDA) using Python and also uncover how simple EDA can be extremely helpful in performing preliminary data analysis. The approach we will follow today is ask some questions and try to get those answers from the data. We will consider the supermarket sales data from the Kaggle dataset.

Conclusion:

We used uni-variate, bi-variate and correlation analysis to perform basic EDA on the supermarket sales data.

To summarize below are some of the findings/observations from the data:

  1. The customer rating is more or less uniform with the mean rating being around 7 and there is no relationship between gross income and customer ratings.
  2. The data consists of 3 cities/branches. Though branch A has slightly higher sales than the rest, C i.e. Naypyitaw is the most profitable branch in terms of gross income.
  3. Fashion accessories and food and beverages are the most sold product in Naypyitaw and these products should be focused on along with electronic accessories.
  4. The most popular payment method is E-wallet and cash payment is also on the higher side.
  5. There is no particular time trend that can be observed in gross income.
  6. At an overall level, ‘Sports and Travel’ generates highest gross income.
  7. Gross income is similar for both male and female, though female customers spend a bit higher at the 75th percentile. Females spend on ‘fashion accessories’ the most and for males surprisingly it is ‘Health and beauty’. Females also spend more on ‘Sports and travel’ which generates highest income overall.
  8. Using the correlation analysis, one interesting observation has emerged that customer ratings is not related to any variable.
  9. Most of the customers buy 10 quantities and busiest time of the day is afternoon i.e. around 2 pm which records highest sales. Sales is higher on Tuesdays and Saturdays compared to the rest of the week.
  10. Though the rating for ‘fashion accessories’ and ‘food and beverages’ is high but the quantity purchased is low. Hence, supply for these products need to be increased.

About

Before venturing on to any data science project it is important to pre-process the data and also to explore the data. Today we will discuss a very basic topic of exploratory data analysis (EDA) using Python and also uncover how simple EDA can be extremely helpful in performing preliminary data analysis. The approach we will follow today is ask some questions and try to get those answers from the data. We will consider the supermarket sales data from the Kaggle dataset.

License:Apache License 2.0


Languages

Language:Jupyter Notebook 100.0%