Make preprocessor aware of downsampling
jamesmf opened this issue · comments
Transformer layers are costly with respect to input length, and that is particularly a problem with character-level models. One option is to reduce the sequence length with strided convolutions or strided pooling before the transformer layer(s), then upsample afterward.
To make this pattern more straightforward across multiple components, the Preprocessor
can be make aware of the downsample_factor
and make sure that inputs are padded appropriately to make the upsampling the same shape.
As part of this implementation, the Preprocessor
should be able to make the length
on string_to_array
optional