AkashBangalkar / Amazon-Apparel-Recommendations-System

Machine Learning - Content Based Recommendation System

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Amazon Apparel Recommendations System

amazon

Introduction

The rise of web services has made recommender systems an integral part of our lives. Recommender systems contribute significantly to the revenue generated by e-commerce sites such as Amazon and Flipkart. The Recommender System helps in suggesting the right content to the right user, thus helping in building a better user experience.

Recommender Systems are algorithms whose ultimate goal is to suggest content (movies, products) to the users. In many industries, the recommendations system plays a vital role.

We used Amazon’s Product Advertising API to obtain the data in a policy-complaint manner, and we acquired data for 1,83,000 products. For each product, we obtained many features like Title, Brand, Color, Image-url, Price etc. For this case-study, we primarily focused on Women’s Apparel data. Amazon’s Product Advertising API can be used to extract data for other products as well.

Business Objective

  • Recommending similar items/products (apparel) to the given product (apparel) in any e-commerce websites based on Text and Image Features.
  • Recommending similar apparel items to the user. It is estimated that Amazon’s 35% revenue is generated using product recommendations.

Approach

  • Content Based Recommendation: As its name suggests, we do content based recommendations, means it's based on Title text, Description text and Images.

Plan of Attack

Following steps we followed during this case-study:

  1. Data Aquisition
  2. Data Cleaning
  3. Text Processing (NLP)

We solve this problem with both Text-based and Image-based.

1. Text Based Product Recommendation/Similarity

  • Bag of Words
    BoW is simple frequency counter, which gives more importance to a word that occurs more often in a given document.

  • Term Frequency-Inverse Document Frequency (TF-IDF)
    TFIDF is also sort of like a frequency counter, which gives more importance to a word that occurs less often in the whole document.

  • Text Semantics Based Product Similarity
    Semantic Similarity or Semantic Textual Similarity, is a task in the area of Natural Language Processing (NLP) that scores the relationship between texts or documents using a defined metric.
    For eg. we know that the words 'tiger' and 'leopard' are related, also 'zebra' and 'strips' related.

    • Word2Vec - Word2Vec is one such very popolar technique that learns meaningful relations and encodes the relatedness into vector similarity.
    • Avg W2V
    • IDF Weighted W2V
  • Weighted Similarity Using Brand & Color

    • In addition to title text, we can also use variables/parameters like Brand and Color. For each product it has Brand and Color, for this brand and color we will try to construct the seperate brand and color vectors using One-Hot-Encoding technique.
    • Once we construct these vectors, then we concatenate them with title vector.

    Let's assume, we want to prefer showing our customers the products of the same brand because probably some of them are brand conscious, OR If we want to pick or show products of the same color, then all we have to modify the Euclidean Distance, simply take Weighted Euclidean Distance.
    For that, we have to assign weights to each vector and simply multiply each elements of the vector with the assigned weights. After multiplying, just take the Euclidean Distance, which is extractedly similar to the Weighted Euclidean Distance in concept.

    After doing this, what we get;
    For Eg.
    Weight for Title = 1
    Weight for Brand = 5
    Weight for Color = 1


    In this case, since the brand weight is greater, we end up preferring products of the same brand. By just changing the weight, we can easily give our preference for the same brand or color.

2. Image Based Product Recommendation/Similarity

There is one more very interesting parameter Product Image Url. For every product, we have a URL for product Image and with the help of URL we can download the product Image.

Now the question is how to featurize an Image OR how to convert an Image to an n-dimensional vector ?

  • Using CNN (Deep Learning Technique)

    • We can convert an Image to an n-dimensional vector using Deep Learning technique called Convolutional Neural Network (CNN/ConvNets).
    • Since 2012, Google, Microsoft, Baidu, Nvidia, Facebook, Amazon and lots of companies & also some of universities like Oxford trained special ConvNets which works well.
    • There are multiple ConvNets like AlexNets, VGG16, VGG19, ResNet which used to convert an Image to an n-dimensional vector.
    • Training and desgining good ConvNets is very very expensive as it requires lots of hardwares, requires lots of expertise. If two Image are very similar then the euclidean distance between two vectors of Image is very small.

    There are multiple types of Convolutional Neural Network (CNN/ConvNets), as already mentioned above. We are going to use one of the very popolar ConvNet - VGG16.

    • VGG16 (also called OxfordNet) is a Convolutional Neural Network architecture named after the Visual Geometry Group from Oxford, who developed it.

    Now to compute an n-dimensional vector from given Image, we will use two very popular libraries Keras and Tensorflow.

    • Tensorflow is deep learning library which is designed and built at Google which is open source library.
    • Similarly, Keras is also popular open source library which simplifies building CNNs and other types of Neural Networks very easily.

3. Building Real World Solution

  • We know multiple techniques such as BoW, TFIDF, Word2Vec, CNN, etc.
  • In reality, most companies/organizations or universities/institutes use multiple techniques. There is no single technique that always performs best.
  • Oftentimes all results are taken into consideration. We combine all of the results/algorithms and apply something called Business Rules.
    • Business rules could be; Do not show products of the same brand more than twice. We could have lots of business rules like this, which are dictated by business.
    • After applying business rules, we finally come up with a final result or list of recommendations. This final result or list that we get, is what we show to the customers.
    • So, In the real-world, in addition to the model, we have a bunch of business rules that dictate the quality of results.

4. A/B Testing

  • A/B Testing is methodology to measure how good our solution performs. A/B Testing is also called "Bucket Testing"
    • Let's assume, we built two solutions by combining multiple results/algorithms. Here, the solution --> An algorithm along with business rules becomes a solution, that we deploy.

    • Now we have two solutions --> Solution 1 and Solution 2.

    • So, using A/B Testing we can evaluate which solution is better. Not subjectively, because different people could have different opinions.

    • We will split users (from site) randomly into two groups --> A and B

    • Users in Group A, show them the result of Solution 1, and Users in Group B, show them the result of Solution 2.

    • When we run our both solutions, then we can measure numerically the purchase and sales.

    • Let's say if sales in Group A > sales in Group B, then we can say that the customers actually like Solution 1.

    • i.e., Solution 1 is better than Solution 2 because after testing on users, they tend to prefer Solution 1. A/B Testing

Reference

About

Machine Learning - Content Based Recommendation System


Languages

Language:Jupyter Notebook 100.0%