talitacgs / Market_basket_analysis_with_Apriori

Market basket analysis using Apriori algorithm and association rules

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Market Basket Analysis using Apriori Algorithm

Objective

Using the Apriori algorithm and association rules to identify frequent itens

Skills

  • Explore Data Visualization;
  • Apriori algorithm;
  • Association Rules;

Data Source

Groceries Dataset from Kaggle

Tools and Technologies

  1. Google Colab;
  2. Python 3.10.12;
  3. Pandas 1.5.3;
  4. mlxtend 0.22.0;

Dependencies

For the project, libraries can be divided into two types:
  1. General
import pandas as pd
  1. Model Building
from mlxtend.frequent_patterns import association_rules, apriori

Data Cleaning

Steps:

  • Adjust data type using .to_datetime() and .astype(str) from pandas;
  • Rename column using .rename() from pandas;

Data Pre Processing

Steps:

  • Detecting Weekday using .dt.weekday() from pandas and replace the output for day name;
  • Making a matrice with .pivot_table()

Data Visualization

ASAP ... 🚧👩‍💻

Data Building

  • apriori() is an influential algorithm that is generally used in the field of data mining & association rule learning. It is used to identify frequent itemsets in a dataset & generate an association based rule based on the itemsets;
  • association_rules() : Rule generation is a common task in the mining of frequent patterns. An association rule is an implication expression of the form X→Y , where X and Y are disjoint itemsets;
    • metric: see below

Results and Output

Arranging the data from highest to lowest with respect to 'confidence' we observed that people who bought domestic eggs and meat, also bought meat in 1% of cases with a confidence level of 78%

But let's understand better all output:

  • Antecedents / Consequents: They represents the sets of items in antecedent and consequence of each rule, respectively;
  • Confidence: Indicates the probability of finding the consequence in a transaction that already contains the antecedent;
  • Lift: Measures how much more likely the consequence is given that the advance occured compared to its expected probability if the antecedent and consequence were independent;
  • Leverage: Measures the difference between the actual joint occurence and the expected occurence of they were independent;
  • Conviction: Measures the dependence of a consequence on antecedent.
  • Zhang's metric: Measure that takes into account both association and dissociation between items in an association rule. This can be useful in scenarios where not only the presence but also the absence of certain items is relevant to the interpretation of the rules.

References

About

Market basket analysis using Apriori algorithm and association rules


Languages

Language:Python 100.0%