import pandas as pd
df = pd.read_csv(r'https://docs.google.com/spreadsheets/d/16i38oonuX1y1g7C_UAmiK9GkY7cS-64DfiDMNiR41LM/export?format=csv&gid=0')
df
order_id | shop_id | user_id | order_amount | total_items | payment_method | created_at | |
---|---|---|---|---|---|---|---|
0 | 1 | 53 | 746 | 224 | 2 | cash | 2017-03-13 12:36:56 |
1 | 2 | 92 | 925 | 90 | 1 | cash | 2017-03-03 17:38:52 |
2 | 3 | 44 | 861 | 144 | 1 | cash | 2017-03-14 4:23:56 |
3 | 4 | 18 | 935 | 156 | 1 | credit_card | 2017-03-26 12:43:37 |
4 | 5 | 18 | 883 | 156 | 1 | credit_card | 2017-03-01 4:35:11 |
... | ... | ... | ... | ... | ... | ... | ... |
4995 | 4996 | 73 | 993 | 330 | 2 | debit | 2017-03-30 13:47:17 |
4996 | 4997 | 48 | 789 | 234 | 2 | cash | 2017-03-16 20:36:16 |
4997 | 4998 | 56 | 867 | 351 | 3 | cash | 2017-03-19 5:42:42 |
4998 | 4999 | 60 | 825 | 354 | 2 | credit_card | 2017-03-16 14:51:18 |
4999 | 5000 | 44 | 734 | 288 | 2 | debit | 2017-03-18 15:48:18 |
5000 rows × 7 columns
df['order_amount'].sum() / df['order_amount'].size
3145.128
Shop 42 is selling an unreasonable amount of sneakers.
They sold 34,063 pairs of sneakers in 30 days totalling $11,990,176.
sorted_df = df.sort_values(['order_amount', 'total_items'], ascending=[False, True])
sorted_df
order_id | shop_id | user_id | order_amount | total_items | payment_method | created_at | |
---|---|---|---|---|---|---|---|
15 | 16 | 42 | 607 | 704000 | 2000 | credit_card | 2017-03-07 4:00:00 |
60 | 61 | 42 | 607 | 704000 | 2000 | credit_card | 2017-03-04 4:00:00 |
520 | 521 | 42 | 607 | 704000 | 2000 | credit_card | 2017-03-02 4:00:00 |
1104 | 1105 | 42 | 607 | 704000 | 2000 | credit_card | 2017-03-24 4:00:00 |
1362 | 1363 | 42 | 607 | 704000 | 2000 | credit_card | 2017-03-15 4:00:00 |
... | ... | ... | ... | ... | ... | ... | ... |
4219 | 4220 | 92 | 747 | 90 | 1 | credit_card | 2017-03-25 20:16:58 |
4414 | 4415 | 92 | 927 | 90 | 1 | credit_card | 2017-03-17 9:57:01 |
4760 | 4761 | 92 | 937 | 90 | 1 | debit | 2017-03-20 7:37:28 |
4923 | 4924 | 92 | 965 | 90 | 1 | credit_card | 2017-03-09 5:05:11 |
4932 | 4933 | 92 | 823 | 90 | 1 | credit_card | 2017-03-24 2:17:13 |
5000 rows × 7 columns
df.loc[df['shop_id'] == 42, ['total_items', 'order_amount']].sum()
total_items 34063
order_amount 11990176
dtype: int64
The median is a better metric to report because it is more resilient to outliers in the data.
The median order value is $284.00.
df['order_amount'].describe()
count 5000.000000
mean 3145.128000
std 41282.539349
min 90.000000
25% 163.000000
50% 284.000000
75% 390.000000
max 704000.000000
Name: order_amount, dtype: float64
SELECT COUNT(*) FROM Orders
WHERE ShipperID IN
(SELECT ShipperID FROM Shippers
WHERE ShipperName = 'Speedy Express')
SELECT Employees.LastName, COUNT(*) as "NumOrders"
FROM Orders
JOIN Employees
ON Orders.EmployeeID = Employees.EmployeeId
GROUP BY Orders.EmployeeId
ORDER BY 2 DESC
LIMIT 1
ProductID 40 called "Boston Crab Meat" is the most ordered product by customers in Germany with a total quantity of 160
SELECT OrderDetails.ProductID, Products.ProductName, SUM(OrderDetails.Quantity) as "Quantity"
FROM Orders
JOIN OrderDetails
ON Orders.OrderID = OrderDetails.OrderID
JOIN Products
ON OrderDetails.ProductID = Products.ProductID
JOIN Customers
ON Orders.CustomerID = Customers.CustomerID
WHERE Orders.CustomerID IN
(SELECT CustomerID
FROM customers
WHERE Country = 'Germany')
GROUP BY OrderDetails.ProductID
ORDER BY 3 DESC
LIMIT 1