95anantsingh / Data-Science-Challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shopify Data Science Challenge

Question 1:

Given some sample data, write a program to answer the following: click here to access the required data set.

On Shopify, we have exactly 100 sneaker shops, and each of these shops sells only one model of shoe. We want to do some analysis of the average order value (AOV). When we look at orders data over a 30 day window, we naively calculate an AOV of $3145.13. Given that we know these shops are selling sneakers, a relatively affordable item, something seems wrong with our analysis. 

  1. Think about what could be going wrong with our calculation. Think about a better way to evaluate this data.

Ans: Incorrect AOV calculation is most likely due to use of count() function in calculation of total numbers of items. count() function will only count the number of rows, we need to use sum() function to get the total number of items.

  1. What metric would you report for this dataset?

Ans: To get the accurate AOV we need to get total number of orders and total number of items and then get AOV by dividing them.

total_order_amount = df['order_amount'].sum()
total_items = df['total_items'].sum()
aov = total_order_amount / total_items

print("Average Order Value: $%.2f"% (aov))
  1. What is its value?

Ans: $357.92

Code is in Question_1.ipynb

Question 2:

For this question you’ll need to use SQL. Follow this link to access the data set required for the challenge. Please use queries to answer the following questions. Paste your queries along with your final numerical answers below.

  1. How many orders were shipped by Speedy Express in total?

Ans: 54

Query -

SELECT count(*) AS Orders 
FROM Orders
JOIN Shippers
ON Orders.ShipperID == Shippers.ShipperID
WHERE ShipperName =="Speedy Express"
  1. What is the last name of the employee with the most orders?

Ans: Peacock

Query -

SELECT LastName FROM Orders
LEFT JOIN Employees
ON Orders.EmployeeID == Employees.EmployeeID
GROUP BY Orders.EmployeeID
ORDER BY Count(*) DESC
LIMIT 1
  1. What product was ordered the most by customers in Germany?

Ans:
Product Name : Camembert Pierrot
Product ID : 60

Query -

SELECT Products.ProductID, Products.ProductName, (COUNT(*) * OrderDetails.Quantity) AS TotalOrders 
FROM Orders, OrderDetails
JOIN Customers 
ON Orders.CustomerID == Customers.CustomerID
JOIN Products 
ON OrderDetails.ProductID == Products.ProductID
WHERE Country =="Germany"
GROUP BY Products.ProductId
ORDER BY TotalOrders DESC
LIMIT 1

Code can be found in Question_2.sql

About


Languages

Language:Jupyter Notebook 100.0%