classification dataset deep-learning feature-extraction feature-selection machine-learning openai-gym preprocessing preprocessor regression reinforcement-learning spectral-analysis standardization timeseries trims

Preprocessor

A simple timeseries data pre-processor.

Description

Implements modular components for dataset preprocessing: a data-trimmer, a standardizer, a feature selector and a sliding window data generator.

All modules are usable both from command line and from class methods.

Installation

To install the package via PIP, use the following command:

pip install -i https://test.pypi.org/simple/ harveybc-preprocessor

Also, the installation can be made by clonning the github repo and manually installing it as in the following instructions.

Github Installation Steps

Clone the GithHub repo:

git clone https://github.com/harveybc/preprocessor

Change to the repo folder:

cd preprocessor

Install requirements.

pip install -r requirements.txt

Install python package (also installs the console command data-trimmer)

python setup.py install

Add the repo directory to the environment variable PYTHONPATH
(Optional) Perform tests

python setup.py test

(Optional) Generate Sphinx Documentation

python setup.py docs

Modules

All the CLI commands and the class modules are installed with the preprocessor package, the following sections describe each module briefly and link to each module's basic documentation.

Detailed Sphinix documentation for all modules can be generated in HTML format with the optional step 6 of the installation process, it contains documentation of the classes and methods of all modules in the preprocessor package.

Data-Trimmer

A simple data pre-processor that trims the constant valued columns. Also removes rows from the start and the end of a dataset with features with consecutive zeroes.

See Data-Trimmer Readme for detailed description and usage instructions.

Standarizer

Standardizes a dataset and exports the standarization configuration for use on other datasets.

See Standardizer Readme for detailed description and usage instructions.

Sliding Window

Performs the sliding window technique and exports an expanded dataset with configurable window_size.

See Sliding Window Readme for detailed description and usage instructions.

Feature Selector

Performs the feature selection based on a classification or regression training signal and a threeshold.

See Feature Selector Readme for detailed description and usage instructions.

Examples of usage

The following examples show both the class method and command line uses for one module, for examples of other modules, please see the specific module´s documentation.

Example: Usage via Class Methods (data_trimmer module)

from preprocessor.data_trimmer.data_trimmer import DataTrimmer
# configure parameters (same variable names as command-line parameters)
class Conf:
    def __init__(self):
        self.input_file = "tests/data/test_input.csv"
conf = Conf()
# instance trimmer class and loads dataset
dt = DataTrimmer(conf)
# perform the module's core method
dt.core()
# save output to output file
dt.store()

Example: Usage via CLI (data_trimmer module)

data_trimmer --input_file "tests/data/test_input.csv"

About

Modular components for dataset preprocessing: a data-trimmer, a standarizer, a sliding-window generator and a feature selector, all optionally useable from command-line.

classification dataset deep-learning feature-extraction feature-selection machine-learning openai-gym preprocessing preprocessor regression reinforcement-learning spectral-analysis standardization timeseries trims

MIT License

Languages

Language:Python 95.9%Language:Shell 4.1%