Hypothesis testing project on Northwind Database by Luigi Fiori

Files

Northwind_small.sqlite: Raw Data
Project_Northwind_db.ipynb: Notebook for Analysis
Presentation.pdf

Introduction

We need to perform soem hypothesis tests on the Northwind Database. This is a Microsoft open sorce/free database.

Our goal is to formulate hypothesis tests such that we can get some valuable insights for a potential company.

My approach for this project has been:

Formulate interesting questions-->In order to get some useful insights
EDA--> In order to get an understanding of our data summarizing their main characteristics mainly through visualization
Defining Null and Alternative Hypotheses-->Answer to the research question and guide data collection and interpretation
Set Alpha level--> I used a alpha level of 0.05 for all my Hypotheses
Test our Data Set--> Use different tests to understand the p-value
Check Effect Size--> d cohen's formula
Obtain valuable Insights--> Give some raccomandations

Formulate our Questions

Does discount amount have a statistically significant effect on the quantity of a product in an order? If so, at what level(s) of discount?
Is it December, considering the Christmas time period in it, the month with the highest revenue?
Does discount amount have a statistically significant effect on the revenue of a product in an order? If so, at what level(s) of discount?
There is a statistically significant difference on shipping time between different shipper companies?
There is a statistically significant difference in terms of Revenue for different countries?
There is statistically significant difference in terms of Revenue between different categories?

Let's go through each question!

1. Does discount amount have a statistically significant effect on the quantity of a product in an order? If so, at what level(s) of discount?

Defining Null and Alternative Hipothesis

Considering that we wanna observe if the quantity sold discounted is greater than the quantity sold not discounted we'll perform a One Tail Test.

H_null = qt. discounted sold <= qt. not discounted sold H_alt = qt. discounted sold > qt. not discounted sold

Welch's Test

Effect Size

With or Without Discount Applied

Conclusions

Results are all statistically significant except for the level 0.1.

This means that a discount of 10% on average doesn't have any impact on the quantity sold compared to not applying the discount.

We advice so, to use a 5% discount instead, d Cohen's for 5% discount results being the one with the highest effect size, moreover smaller leave us more profit but still does have an higher impact on quantity sold compared to the 10%, or to apply an higher discount level to try to increase the quantity sold.

Different Levels of Discounts

Conclusions

Results, as we can see from the p-value, are all not statistically significant.

This means that we failed to reject the null hypothesis.

We advice the management build their strategies taking in consideration the presence or not of the discount instead of a comparison between different levels of discount.

2. Is it December, considering the Christmas time period in it, the month with the highest revenue?

Defining Null and Alternative Hipotheses

H_null = Revenue for December <= Revenue other month

H_alt = Revenue for December > Revenue other month

As we thought during December we've got an high Revenue peak due to the Christmas, but as we can see May results having the highest peak followed by August.