Help needed on using the GAME API
nishamuktewar opened this issue · comments
Hello,
I am trying to understand how to use the GAME API, especially, how to include mixed effects both - random intercepts and random slopes and thought it might be okay to ask here. Let's say I am using the MovieLens dataset and wanted to build a mixed model by adding just a random intercept based on the userId, how can that be achieved? I tried specifying it in the following way:
spark2-submit \
--class com.linkedin.photon.ml.cli.game.training.Driver \
--master yarn \
--deploy-mode client \
--num-executors 4 \
--driver-memory 10g \
--executor-memory 10g \
photon-all_2.11-1.0.0.jar \
--train-input-dirs "hdfs:///user/nisha/Data/photon-ml/movieLens/train/" \
--output-dir "hdfs:///user/nisha/Data/photon-ml/movieLens/output" \
--task-type "LINEAR_REGRESSION" \
--feature-name-and-term-set-path "hdfs:///user/nisha/Data/photon-ml/movieLens/featuresets/" \
--feature-shard-id-to-feature-section-keys-map "globalShard:|userShard:" \
--updating-sequence global,per-user \
--application-name "GAME model testing" \
--validate-input-dirs "hdfs:///user/nisha/Data/photon-ml/movieLens/test" \
--fixed-effect-optimization-configurations "global:10,1e-5,1,1.0,TRON,L2" \
--random-effect-optimization-configurations "per-user:10,1e-5,1,1.0,TRON,L2" \
--fixed-effect-data-configurations "global:globalShard,1" \
--random-effect-data-configurations "per-user:userId,userShard,1,10,5,0.5,index_map" \
--input-column-names "response:response|uid:userId|offset:offset|weight:weight|metadataMap:metadataMap" \
--delete-output-dir-if-exists "true" \
--num-iterations 5 \
--evaluator-type RMSE \
--summarization-output-dir "hdfs:///user/nisha/Data/photon-ml/movieLens/training-smry" \
--normalization-type NONE \
--compute-variance false
This does produce some resultant coefficients - a fixed effect intercept and intercepts by userIds.
fixed effect intercept: probably mean of the training set response variable - rating
{u'variances': None, u'means': [{u'term': u'', u'name': u'(INTERCEPT)', u'value': 3.5454240769000385}], u'modelClass': u'com.linkedin.photon.ml.supervised.regression.LinearRegressionModel', u'lossFunction': u'', u'modelId': u'fixed-effect'}
random effects for each userId:
{u'variances': None, u'means': [{u'term': u'', u'name': u'(INTERCEPT)', u'value': 0.7932438678450324}], u'modelClass': u'com.linkedin.photon.ml.supervised.regression.LinearRegressionModel', u'lossFunction': u'', u'modelId': u'273'}
{u'variances': None, u'means': [{u'term': u'', u'name': u'(INTERCEPT)', u'value': 0.10382895222067612}], u'modelClass': u'com.linkedin.photon.ml.supervised.regression.LinearRegressionModel', u'lossFunction': u'', u'modelId': u'253'}
......
So does that mean the userId = 273's random intercept is actually 3.545 + 0.793 = 4.3386?
If I were using R's lmer package, I would use something like:
userModel <- lmer(rating ~ (1|userId), data=movieLensTrain)
and it would produce results of the form:
Fixed effects:
Estimate Std. Error t value
(Intercept) 3.66559 0.01824 201
> coef(userModel)
$userId
(Intercept)
1 2.706271
2 3.453963
. ...
273 4.254384
where userId = 273's random intercept = 4.254
Understand that the numbers won't match exactly because of the different hyperparams + underlying implementation. But wanted to know if this how it is done? And also how can I add random slopes based on the userId?
Thank you for your time. Once I can figure this out I can help add some documentation on how to use this API.
Looks like you're on the right track. To add random slope, you need to assign feature bags to the random-effect shard: e.g. userShard:genreFeatures,movieLatentFactorFeatures
. This will add features from the "genreFeatures" and "movieLatentFactorFeatures" bags into the feature vectors for the per-user problem, and hence learn a slope on those features.
If you haven't seen it already, we have an interactive tutorial here: https://github.com/linkedin/photon-ml/wiki/Photon-ML-Tutorial
This tutorial shows how to use our new API, which should be a lot more user friendly than the command-line interface.
Thanks @joshvfleming.
Appreciate your response. So it seems my understanding of the random intercept coefficient for a userId is correct? I will try the random slope logic like you suggested and go through the tutorial. Thanks again.