- Explore Data Visualization;
- Apriori algorithm;
- Association Rules;
Groceries Dataset from Kaggle
- Google Colab;
- Python 3.10.12;
- Pandas 1.5.3;
- mlxtend 0.22.0;
- General
import pandas as pd
- Model Building
from mlxtend.frequent_patterns import association_rules, apriori
Steps:
- Adjust data type using
.to_datetime()
and.astype(str)
frompandas
; - Rename column using
.rename()
frompandas
;
Steps:
- Detecting Weekday using
.dt.weekday()
frompandas
and replace the output for day name; - Making a matrice with
.pivot_table()
ASAP ... 🚧👩💻
apriori()
is an influential algorithm that is generally used in the field of data mining & association rule learning. It is used to identify frequent itemsets in a dataset & generate an association based rule based on the itemsets;association_rules()
: Rule generation is a common task in the mining of frequent patterns. An association rule is an implication expression of the form X→Y , where X and Y are disjoint itemsets;- metric: see below
Arranging the data from highest to lowest with respect to 'confidence' we observed that people who bought domestic eggs and meat, also bought meat in 1% of cases with a confidence level of 78%
But let's understand better all output:
- Antecedents / Consequents: They represents the sets of items in antecedent and consequence of each rule, respectively;
- Confidence: Indicates the probability of finding the consequence in a transaction that already contains the antecedent;
- Lift: Measures how much more likely the consequence is given that the advance occured compared to its expected probability if the antecedent and consequence were independent;
- Leverage: Measures the difference between the actual joint occurence and the expected occurence of they were independent;
- Conviction: Measures the dependence of a consequence on antecedent.
- Zhang's metric: Measure that takes into account both association and dissociation between items in an association rule. This can be useful in scenarios where not only the presence but also the absence of certain items is relevant to the interpretation of the rules.