https://eugeneyan.com/writing/system-design-for-discovery/

Question

https://eugeneyan.com/writing/system-design-for-discovery/

utterances-bot opened this issue 3 years ago · comments

utterances bot commented 3 years ago

System Design for Recommendations and Search

Breaking it into offline vs. online environments, and candidate retrieval vs. ranking steps.

https://eugeneyan.com/writing/system-design-for-discovery/

David Masip · Answer 1 · Wed Jun 30 2021 21:52:10 GMT+0800 (China Standard Time)

Thanks for a very interesting post!

One thing that got me wondering is this claim:

Ranking can be modeled as a learning-to-rank or classification task, with the latter being more commonly seen. If deep learning is applied, the final output layer is either a softmax over a catalog of items, or a sigmoid predicting the likelihood of user interaction (e.g., click, purchase) for each user-item pair.

When would you do one or the other? Are there advantages of one respect to the other (softmax of catalog vs sigmoid of user-item pair).

Eugene Yan · Answer 2 · Fri Jul 02 2021 09:50:25 GMT+0800 (China Standard Time)

Thanks for the great question!

In general, softmax of catalog implies a fixed set of output items. Thus, whenever new items are added to the catalog, you'll have to change the output layer and retrain the model. In addition, training with a large softmax layer is time-consuming. Typically, the softmax layer is constrained (to a limited set of items) to speed up training time, or a negative sampling approach is adopted. See TripAdvisor's and YouTube's final layer here.

On the other hand, sigmoid of user-item pair can work with any number of items, as well as new items (provided the item embeddings are available). That said, it requires one prediction for each user-item pair and could be costly if many there are many pairs. (Contrast this with the softmax approach where you only need to predict once to get probabilities for all items). Most implementations I've seen adopt this approach (see other approaches in this post).

Jian Shen · Answer 3 · Fri Jul 02 2021 12:01:59 GMT+0800 (China Standard Time)

Hi Eugene, thanks for the explanation. Just want to clarify, "learning-to-rank" general refers to pairwise approach, and "classification" refers to a fixed list of output item, is that right?

It seems that we will almost always be more merit to implement learning-to-rank over classification, due to the fact that learning-to-rank handles new items well without needing to make change for model architecture. Yet as you mentioned, classification is more popular.

What are the catches here that I might have missed?

Eugene Yan · Answer 4 · Thu Jul 08 2021 10:37:03 GMT+0800 (China Standard Time)

It's not necessarily the case that classification can't handle new items without changing model architecture. You could adopt an approach where the final layer is a sigmoid of a user-item pair. This way, new items can be represented via their embedding and attributes (e.g., category, price) without having to change the model architecture.

Jian Shen · Answer 5 · Thu Jul 08 2021 10:58:30 GMT+0800 (China Standard Time)

Thanks for the explanation. I was curious on why you haven't been highlighting much on learning-to-rank approaches, and also why not too many companies has adapted LTR approaches except AirBnb, where they posted a paper, proposing the combining NN with LambdaRank.

Our experience dealing with LTR is that it requires more data to train and is generally slower. Would like to know what is your experience on this!

Eugene Yan · Answer 6 · Thu Jul 08 2021 11:25:08 GMT+0800 (China Standard Time)

I've not mentioned much about LTR here because there haven't been many industry papers sharing the LTR approach. It's also been my experience that LTR doesn't perform as well out of the box, and even after much tweaking.

Rajkumar Kaliyaperumal · Answer 7 · Wed Feb 01 2023 14:25:13 GMT+0800 (China Standard Time)

Thanks @eugeneyan. That was a very comprehensive coverage of system design from several major tech companies. Thanks