deep-learning-with-pytorch / dlwpt-code

Code for the book Deep Learning with PyTorch by Eli Stevens, Luca Antiga, and Thomas Viehmann.

Home Page:https://www.manning.com/books/deep-learning-with-pytorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Suggestion: mentioning neural nets for ordinal regression in chapter 4

rasbt opened this issue · comments

Just reading chapter 4, you mentioned on pg. 83 that ordinal targets are usually either treated as continuous ("metric" regression) or nominal (conventional classification).

Just wanted to mention that there is a growing interest in outfitting deep neural nets with capabilities to regard them as ordinal targets without assuming a continuous or nominal nature. Personally, I have worked on this topic here: https://github.com/Raschka-research-group/coral_pytorch. Our approach is based on recasting the problem into binary classification tasks, but other approaches exist too.

I probably wouldn't discuss ordinal regression in detail in this chapter, but maybe adding a footnote would be nice (this could potentially provide some people with an entry point, and it can save some people from some frustration/reinventing the wheel)

Thank you for the pointer. Note that we're at the other end of the spectrum here with embedding. If one absolutely wanted to try to get something ordered there, one could force the sign of the embedding and then sum, but I'm not sure that is actually of much use in practice.
In the end, something like ordinal regression that might be more suitable for chapter 7 when we discuss categorical predictions. Do you know if there is a survey for methods or an explanation why the intuitive things like plain conditional probabilities fail (ie if you have output scores y (y_k, k = 1..N), one could have a vector log of predicted probs for x >= r_q (q = 1..N) as (y-log(exp(y)+1)).cumsum() when the binary tasks were x >= r_q | x >= r_(q-1) ). This is very similar to CORAL, but you get the monotonicity immediately from the things being probabilities and you don't introduce "a line" where the predictions must map before defining the jump points for the rank).

In the end, something like ordinal regression that might be more suitable for chapter 7 when we discuss categorical predictions.

I probably wouldn't go into explaining ordinal regression in detail, but at that place, because you were discussing the use of metric regression vs categorical treatment for an ordinal dataset. I thought it might be worth mentioning via a footnote ala "Specialized methods for working with ordinal (ordered categorical) data exist, known as ordinal regression or ordinal classification, but are outside the scope of this book"

I'd say it is largely due to practicality (empirical performance). We haven't considered your suggestion but started experimenting with other conditional probability models last year. So far, we haven't found anything that performed better than the Niu et al. method or CORAL.

Most methods we tried followed the principle that the previous output node was
a) added to the current output node in some form considering the context x >= r_q | x >= r_(q-1)
or b) multiplied if considering the context x <= r_q | x <= r_(q-1)

Revisiting your example, y1-log(exp(y2)+1)).cumsum(), you mean e.g. for rank 3 (where possible ranks are 1..5), the true label y1 is a vector [1, 1, 0, 0]? And y2 are the logits?