Based on Instacart orders, calculate, for each department, the number of times a product was requested, number of times a product was requested for the first time and a ratio of those two numbers.
- Implementation
1.1 Libraries required
1.2 Explanation of classes and functions
1.3 Execution Procedure - Time and Space Complexities Analysis
- Unit Testing
- Instruction to Run Code
sys
library is needed in purchase_analytics.py
from sys import argv
For unit testing, unittest
, os
and purchase_analytics
are needed in test_purchase_analytics.py
import unittest
import os
from purchase_analytics import Department, Report, get_prod_dept_map, process_order_prod
Two classes Department
and Report
are defined:
Department
: Stores the data associated with each department that need to be written to the output file
def __init__
self.ID
: int- department ID
self.orders
: int- total number of orders
self.first_orders
: int- number of first-time orders
self.percentage
: float- ratio between self.first_orders and self.orders
Report
: Stores a dict in which each Department
object is indexed by department id, as well as a list of sorted Department objects. Several methods to sort objects and write output.
def __init__
self.departments
: dict {int : Department}- mapping department ID to Department object
self.depts_sorted
: list [Department]- Department objects sorted based on department ID
def get_dept(self, dept_ID)
- return the Department object that needs to be updated
def update_percentage(self)
- calculate the percentage attribute of each Department object
def sort_depts(self)
- sort all Department objects by department ID
def write_to_file(self, output_path)
- write the output to output_path
Three functions are defined:
get_prod_dept_map(products_path)
: read products.csv file and return a dict that maps product id to department id
- Arguments:
- products_path: str, path to file products.csv
- Returns:
- mapping: dic{int : int}, a hash map product_id : department_id
process_order_prod(order_products_path, mapping, report)
: read order_products.csv and update Department
and Report
objects
- Arguments:
- order_products_path: str, path to file order_products.csv
- mapping: dic{int : int}, hash map returned by
get_prod_dept_map
- report: A
Report
object
- Returns: None
main
: main function
- Read the paths of input and output files.
- Declare a
Report
object namedreport
that stores allDepartment
objects. - Parse products.csv and generate
prod_dept_map
, a hash table that maps product id to department id. - Parse order_products.csv, read each order entry, update the
orders
andfirst_order
attributes of the correspondingDepartment
object. - After processing entire order_products.csv file, calculate the
percentage
atrribute of eachDepartment
object. - Sort all
Department
objects based on department ID, store the sorted list inreprot.depts_sorted
, and then write to the output file.
get_prod_dept_map
: O(M), M: length of the products.csv file
process_order_prod
: O(N), N: length of the order_products.csv file
Report.sort_depts
: O(nlogn), n: number of departments
Report.write_to_file
: O(n), n: number of departments
Since order_products.csv typically has millions of lines, procoess_order_prod
is the time-limiting step.
Files are read line by line, so this does not take much of memory.
The hashmap prod_dept_map
, which maps product id to department id, grows linearly with the number of products. So the overall space complexity is O(M), with M being the length of the products.csv file (number of products).
A few test cases are provided.
test_get_prod_dept_map
: Test if the {product id : department id} hash map constructed from products.csv file is correct.
test_process_order_prod
: Test if the orders data can be correctly read and calculated, and if less-than-zero number of orders would raise the correct error.
test_output
: Test the output file content and format. In particular, A department_id
should be listed only if number_of_orders
is greater than 0
- Clone the git repository and go to the work directory.
git clone https://github.com/liangli053/Purchase-Analytics
cd Purchase-Analytics
- [optional] To run unit test, uncomment the following in
run.sh
file.
#python3 ./src/test_purchase_analytics.py
- Make run.sh executable.
chmod u+x run.sh
- Run code.
./run.sh
Output is written to ./output/report.csv