machine-learning-algorithms collaborative-filtering recommender-system als recommender-ensemble pyspark

Ensemble of Recommendation Algorithm

PySpark implementation of recommender ensemble (Bagging ensemble for now).

More details about the best practices for building recommendation systems can be found at Recommenders GitHub.

Prerequisite

Basic knowledge on recommender systems
pySpark

How It Works and Why

Bagging (Bootstrap aggregating) is a machine learning (ML) ensemble method designed to improve the stability and accuracy of ML algorithms used in statistical classification and regression.

One of the most successful application of Bagging is Random Forest.

The method implemented here uses the exact same approach as the conventional Bagging ensemble:

Train M recommender models (base models) with bootstrapping of a training set
To predict item ratings, generate M predictions by using the base models and then average the predicted ratings for each item
For recommending top k items, on the other hand, generate M recommendation lists of k items with the base models, combine the list.

Currently, this repo implements three combining methods average, sum, and count.

For more details about how to use the module, see the example notebook which utilizes multiple ALS for movie recommendation

Preliminary Results

Top-k (=10) recommendation performance metrics on MovieLens 100k dataset

x-axis: Number of base models (ALS), M, in the bagging model
Max: Max metric value among the M base models
Min: Min metric value among the M base models
Avg: Averaged metric values of the M base models
Bagging: Ensemble results of the M base models

About

Recommendation model ensemble

machine-learning-algorithms collaborative-filtering recommender-system als recommender-ensemble pyspark

MIT License

Languages

Language:Jupyter Notebook 59.9%Language:Python 40.1%