CHYangzzz / Amazon-DIN-TFrecord-estimator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Two-Tower Model and DIN model(without Dice) use TF-estimator API at Amazon Electronics dataset

batch_size Model max-AUC
32 Two-Tower 0.877
32 DIN(without Dice) 0.893

Requirements

  • Python 3.6
  • Numpy 1.18.5
  • Pandas 1.1.3
  • TensorFlow 2.3.1

Amazon Electronics dataset download

wget -c http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Electronics_5.json.gz  
gzip -d reviews_Electronics_5.json.gz  
wget -c http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/meta_Electronics.json.gz  
gzip -d meta_Electronics.json.gz

Training and Evaluation

  • Step 1: generate tfrecord dataset
python generate_tfrecord.py

or use spark to generate tfrecord
(I use zeppelin and code save in generate_tfrecord.scala)

  • Step 2: training and evaluation
python main.py

you need confirm tfrecord dataset path and param "data_gen_method"("spark" or "python")

  • you can change Two-Tower model to DIN model in main.py's model_fn

Reference:

https://github.com/zhougr1993/DeepInterestNetwork.git

About


Languages

Language:Python 80.7%Language:Scala 19.3%