Rework the preprocessing module
eonu opened this issue · comments
The preprocessing module is quite clunky and should allow a bit more freedom, such as
- allowing any callable, instead of requiring every transformation to be a subclass of
Transform
, - not running the
is_observation_sequences
validation every time (as this is very costly).
The use of tqdm
progress bars is also questionable, and the verbose
argument clutters everything up.
Even if progress bars are used, don't always make them full-width.
Finally, the interface for the Transform
class could be cleaned up, particularly when it comes to having to define a nested function in the transform
method.
Instead of:
def transform(self, X, verbose=False):
def trim_constants(x):
return x[~np.all(x == self.constant, axis=1)]
return self._apply(trim_constants, X, verbose)
The user should just define the transform as an instance method operating on a single observation sequence:
def _transform(x):
return x[~np.all(x == self.constant, axis=1)]
then there is no need for transform()
, and __call__()
would handle the call to _apply()
, and _apply()
would have access to _transform()
directly instead of needing to pass it as an argument.
Note: Would need to think of a way to handle transformations that require all of the observation sequences rather than just one.
Also the Preprocess
class could have a more descriptive name, like Compose
(which is how torchvision
names it).
Implemented in #179.