eugeneyan / eugeneyan-comments

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

https://eugeneyan.com/writing/system-design-for-discovery/

utterances-bot opened this issue · comments

System Design for Recommendations and Search

Breaking it into offline vs. online environments, and candidate retrieval vs. ranking steps.

https://eugeneyan.com/writing/system-design-for-discovery/

Thanks for a very interesting post!

One thing that got me wondering is this claim:

Ranking can be modeled as a learning-to-rank or classification task, with the latter being more commonly seen. If deep learning is applied, the final output layer is either a softmax over a catalog of items, or a sigmoid predicting the likelihood of user interaction (e.g., click, purchase) for each user-item pair.

When would you do one or the other? Are there advantages of one respect to the other (softmax of catalog vs sigmoid of user-item pair).

Thanks for the great question!

In general, softmax of catalog implies a fixed set of output items. Thus, whenever new items are added to the catalog, you'll have to change the output layer and retrain the model. In addition, training with a large softmax layer is time-consuming. Typically, the softmax layer is constrained (to a limited set of items) to speed up training time, or a negative sampling approach is adopted. See TripAdvisor's and YouTube's final layer here.

On the other hand, sigmoid of user-item pair can work with any number of items, as well as new items (provided the item embeddings are available). That said, it requires one prediction for each user-item pair and could be costly if many there are many pairs. (Contrast this with the softmax approach where you only need to predict once to get probabilities for all items). Most implementations I've seen adopt this approach (see other approaches in this post).

Hi Eugene, thanks for the explanation. Just want to clarify, "learning-to-rank" general refers to pairwise approach, and "classification" refers to a fixed list of output item, is that right?

It seems that we will almost always be more merit to implement learning-to-rank over classification, due to the fact that learning-to-rank handles new items well without needing to make change for model architecture. Yet as you mentioned, classification is more popular.

What are the catches here that I might have missed?

It's not necessarily the case that classification can't handle new items without changing model architecture. You could adopt an approach where the final layer is a sigmoid of a user-item pair. This way, new items can be represented via their embedding and attributes (e.g., category, price) without having to change the model architecture.

Thanks for the explanation. I was curious on why you haven't been highlighting much on learning-to-rank approaches, and also why not too many companies has adapted LTR approaches except AirBnb, where they posted a paper, proposing the combining NN with LambdaRank.

Our experience dealing with LTR is that it requires more data to train and is generally slower. Would like to know what is your experience on this!

I've not mentioned much about LTR here because there haven't been many industry papers sharing the LTR approach. It's also been my experience that LTR doesn't perform as well out of the box, and even after much tweaking.

Thanks @eugeneyan. That was a very comprehensive coverage of system design from several major tech companies. Thanks