Implementation of basic PySpark data preprocessing methods
xandaau opened this issue · comments
For the tasks of preprocessing pandas
data and speeding up experiments, we have the Preprocessor
class and a number of base classes with single functionality at preprocessing.
These methods should be implemented for spark
dataframes, in the same paradigm as we have for the Designer
and the Splitter
.
At this moment, the implementation of the following methods is essential:
- Aggregation
- Outliers removal (robust)
- CUPED