faizanahemad / Hybrid-Weighted-Embedding-Recommender

A Hybrid Recommendation system which uses Content embeddings and augments them with collaborative features. Weighted Combination of embeddings enables solving cold start with fast training and serving

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hybrid-Weighted-Embedding-Recommendation

A Hybrid Recommendation system which uses Content embeddings and augments them with collaborative features. Weighted Combination of embeddings enables solving cold start with fast training and serving

TODO:

  • Improve docs
  • Multi-item types
  • Users-Multiple_item_types-Other-things enabled graph
  • No users and items just unique str ids
  • de-couple affinity vectors from rating vectors
  • Proper example of how to build and test an external dataset with ML-100K
    • Example of using content data and not using it
  • Make system independent of content so recsys with no content can be used.
  • Make a section in readme of how to reproduce
  • From Factorization meets the neighborhood paper take section 6 - evaluation of a top-K recommender and implement its metric system
  • Priming of GCN vectors can be done by unbiased svd instead of word2vec
  • Positive, Negative and Anchor can be weighed separately
  • Validation module, try predicting link prediction accuracy by taking test links and mixing fake links in same proportion

TODO:

  • Paper: Figure out sections and relevant papers to take content arrangement hints.

  • Figure out latex template codes for relevant conferences.

    • AMLC
    • ICLR/AAAI/NIPS/ICML/IEEE/ACM kdd, sigkdd, recsys
  • ML-20M

  • ML-100K/ML-1M/ML-20M vanilla, feat, text-feat

  • Ablation Study

    • Resnet Arch vs Normal Arch
    • Text Features, other features, no features/ no content
    • No/Gaussian noise
    • Node2vec, triplet vectors input

Environment Setup

Add .condarc to your home dir with below contents

auto_update_conda: False
channels:
  - defaults
  - anaconda
  - conda-forge
always_yes: True
add_pip_as_python_dependency: True
use_pip: True
create_default_packages:
  - pip
  - ipython
  - jupyter
  - nb_conda
  - setuptools
  - wheel

conda update conda

conda create -n hybrid-recsys python=3.7.4

conda activate hybrid-recsys

Install Fasttext

wget https://github.com/facebookresearch/fastText/archive/v0.9.1.zip
unzip v0.9.1.zip
cd fastText-0.9.1 && make -j4 && pip install .

Install Tensorflow 2.0 from here

pip install --upgrade pip
pip install tensorflow

pip install -r requirements.txt

Experiments

  • Content Based
  • Content + Collaborative with extra features
  • Content + Collaborative with extra features with alpha tree

TODO

Innovation

  • Heterogenous Features via Deep Networks

  • Weighted Triplet Loss

  • Embedding Compression

    • We train in a higher Dimensional Space, After training we use autoencoders to reduce dimensionality.
    • Since our task involves cosine distance, after auto-enc step we do another step where we use triplet loss with Distances calculated from initial bigger embeddings. This is similar to TSNE.
    • the two steps can be combined into one encoder-decoder-triplet architecture where decoder loss and triplet loss are weighted and added.
  • Combine Collaborative and Content Based Approach by

    • building content embeddings first
    • enhancing them with collaborative relations
    • Balancing between them using a weighted scheme to solve cold start problem
  • Multiple hybrid embeddings for sellers at different life-cycle stages. Multiple alpha

References

Interesting Papers

Datasets and Downloads

Misc References

Triplet Loss

Dimensionality Reduction

Metrics

Trouble-Shooting

Tools

Code

About

A Hybrid Recommendation system which uses Content embeddings and augments them with collaborative features. Weighted Combination of embeddings enables solving cold start with fast training and serving

License:MIT License


Languages

Language:Python 99.7%Language:Shell 0.3%