Project Description

In this Project we analyse and preprocess the Book Crossing Dataset collected by Cai-Nicolas Ziegler and apply Machine Learning to recommend different books from a book you previously read. Whole code below is in Python using various libraries. Open source library Scipy is used for preprocessing and Scikit-Learn is used for creating the model.

Total approach towards the project can be seen on kaggle
- Kaggle Notebook : https://www.kaggle.com/mohitnirgulkar/book-recommendation-with-data-analysis

Project Contents

Exploratory Data Analysis
Different ways of building Recommendation system
Model and flask Api

Resources Used

Packages : Pandas, Numpy, Matplotlib, Seaborn, Word-cloud, Scikit-Learn etc.
Dataset : https://www.kaggle.com/mohitnirgulkar/book-recommendation-data

1. Exploratory Data Analysis

Visualising Explicit Rating Counts (for 1-10 rating value)
Visualising top 30 most read books
Visualising top 30 most read books with there average ratings
Visualising top 30 years with most book being published
Visualising top 30 authors with most books
Visualising the age distribution of the users
Extra Analysis
- Some of the Plots and wordclouds which aren't present here can be found in Notebook

2. Different ways of building Recommendation system

Popularity-based

These simply recommend the most popular items to users. Popularity-based systems are simplest of all and have minimal computational requirements. However, as these systems do not make personalized recommendations based on specific user’s likes & behaviors, they tend to be less accurate than content-based or collaborative filtering based systems. This type of recommendation is performed in the notebook, the output i.e. 10 most popular books is
Content-based

Content-based systems depend on external information for creating user and item profiles and this information might not be easily available. Also, these do not take users behavioral information into account and discount the fact that user interest and preferences may change over time.
Collaborative Filtering
- Memory-based/ Neighborhood-based
  
  Memory Based recommendation systems can again be divided into two categories i.e. User Based and Item Based which can easily be implemented using similarity measure like Cosine similarity, Pearson similarity are used to find most similar items according to the Data
- Model-based/Matrix Factorization
  
  Model-based Collaborative Filtering approach employs dimensionality reduction techniques like matrix factorization (Singular Value Decomposition — SVD, Principal Component Analysis- PCA and Latent Factor models) to discover hidden concepts and their relationship with users and items.
- Hybrid Approach
  
  Memory-based and model-based collaborative filtering approaches can be combined in practice to exploit the benefits each of the approaches provide. Also, content-based and collaborative filtering approaches can be combined in various ways to achieve greater synergies between them.

3. Model and Flask Api

Model :-

Scikit-Learn's Nearest Neighbors model is build under collaborative filtering approach. Also we use the Scientific computing library for creating compressed sparse row matrix(csr matrix) from pivot table and is used for modelling with a brute algorithm and cosine as metric
Flask Api :-
1. Clone the Project and download Book_names_with_urlM.csv from the output section and put it in the directory containing model
```
  
    git clone https://github.com/raklugrin01/Book-Recommendation-with-EDA
  
```
1. Install Flask
```
  
    pip install flask
  
```
1. Run the python file
```
  
    python api.py
  
```
Testing result :-
We can see that for a Book Title as input the api returned us 10 books as the recommendations

Refrences

Please do ⭐ the repository, if it helped you in anyway.

raklugrin01 / Book-Recommendation-with-EDA