Rework the preprocessing module

Question

Rework the preprocessing module

eonu opened this issue 3 years ago · comments

Edwin (Ed) Onuonga commented 3 years ago

The preprocessing module is quite clunky and should allow a bit more freedom, such as

allowing any callable, instead of requiring every transformation to be a subclass of Transform,
not running the is_observation_sequences validation every time (as this is very costly).

The use of tqdm progress bars is also questionable, and the verbose argument clutters everything up.

Even if progress bars are used, don't always make them full-width.

Finally, the interface for the Transform class could be cleaned up, particularly when it comes to having to define a nested function in the transform method.

Instead of:

def transform(self, X, verbose=False):
    def trim_constants(x):
        return x[~np.all(x == self.constant, axis=1)]
    return self._apply(trim_constants, X, verbose)

The user should just define the transform as an instance method operating on a single observation sequence:

def _transform(x):
    return x[~np.all(x == self.constant, axis=1)]

then there is no need for transform(), and __call__() would handle the call to _apply(), and _apply() would have access to _transform() directly instead of needing to pass it as an argument.

Note: Would need to think of a way to handle transformations that require all of the observation sequences rather than just one.

Also the Preprocess class could have a more descriptive name, like Compose (which is how torchvision names it).

Edwin (Ed) Onuonga · Answer 1 · Tue May 11 2021 02:13:43 GMT+0800 (China Standard Time)

Implemented in #179.