question: why is multi-hot encoding appropriate for sequences?
mbutterick opened this issue · comments
In Deep Learning with Python 2e, @fchollet says:
Multi-hot encode your lists to turn them into vectors of 0s and 1s. This would mean, for instance, turning the sequence
[8, 5]
into a 10,000-dimensional vector that would be all 0s except for indices8
and5
, which would be 1s.
Wouldn’t this encoding essentially reduce the sequence to a set, which means losing information? For instance, how would the encoded representation of [8, 5]
differ from [5, 8]
or [8, 5, 8]
(or any longer sequence of 8
and 5
elements)?
(This example arises in a tutorial about classifying movie reviews by whether they’re positive or negative. To make the example concrete, if word 5
is pretty
and word 8
is awful
, then there’s going to be a classification difference between pretty awful
and awful pretty
!)