AndreeaAlexandrescuDS / MarketBasketAnalysis

MBA with PySpark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MarketBasketAnalysis

MBA with PySpark

Theory of Apriori Algorithm


There are three major components of the Apriori algorithm:

  • Support
  • Confidence
  • Lift

1) Support


Support refers to the popularity of an item and can be calculated by finding the number of transactions containing a particular item divided by the total number of transactions

2) Confidence


Confidence refers to the likelihood that an item B is also bought if item A is bought.

3) Lift


Lift refers to the increase in the ratio of the sale of B when A is sold.

Association rule by Lift

  • lift = 1 → There is no association between A and B.
  • lift < 1→ A and B are unlikely to be bought together.
  • lift > 1 → greater the lift, greater the likelihood of buying both products together.

Steps Involved in Apriori Algorithm


The Apriori algorithm tries to extract rules for each possible combination of items.
For larger dataset, this computation can make the process extremely slow.
To speed up the process, we need to perform the following steps:
  • Set a minimum value for support and confidence. This means that we are only interested in finding rules for the items that have certain default existence (e.g. support) and have a minimum value for co-occurrence with other items (e.g. confidence).
  • Extract all the subsets having a higher value of support than a minimum threshold.
  • Select all the rules from the subsets with confidence value higher than the minimum threshold.
  • Order the rules by descending order of Lift.

About

MBA with PySpark


Languages

Language:Jupyter Notebook 100.0%