harveybc / preprocessor

Modular components for dataset preprocessing: a data-trimmer, a standarizer, a sliding-window generator and a feature selector, all optionally useable from command-line.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Preprocessor

A simple timeseries data pre-processor.

Build Status Documentation Status BCH compliance license Discord Chat

Description

Implements modular components for dataset preprocessing: a data-trimmer, a standardizer, a feature selector and a sliding window data generator.

All modules are usable both from command line and from class methods.

Installation

To install the package via PIP, use the following command:

pip install -i https://test.pypi.org/simple/ harveybc-preprocessor

Also, the installation can be made by clonning the github repo and manually installing it as in the following instructions.

Github Installation Steps

  1. Clone the GithHub repo:

git clone https://github.com/harveybc/preprocessor

  1. Change to the repo folder:

cd preprocessor

  1. Install requirements.

pip install -r requirements.txt

  1. Install python package (also installs the console command data-trimmer)

python setup.py install

  1. Add the repo directory to the environment variable PYTHONPATH
  2. (Optional) Perform tests

python setup.py test

  1. (Optional) Generate Sphinx Documentation

python setup.py docs

Modules

All the CLI commands and the class modules are installed with the preprocessor package, the following sections describe each module briefly and link to each module's basic documentation.

Detailed Sphinix documentation for all modules can be generated in HTML format with the optional step 6 of the installation process, it contains documentation of the classes and methods of all modules in the preprocessor package.

Data-Trimmer

A simple data pre-processor that trims the constant valued columns. Also removes rows from the start and the end of a dataset with features with consecutive zeroes.

See Data-Trimmer Readme for detailed description and usage instructions.

Standarizer

Standardizes a dataset and exports the standarization configuration for use on other datasets.

See Standardizer Readme for detailed description and usage instructions.

Sliding Window

Performs the sliding window technique and exports an expanded dataset with configurable window_size.

See Sliding Window Readme for detailed description and usage instructions.

Feature Selector

Performs the feature selection based on a classification or regression training signal and a threeshold.

See Feature Selector Readme for detailed description and usage instructions.

Examples of usage

The following examples show both the class method and command line uses for one module, for examples of other modules, please see the specific moduleĀ“s documentation.

Example: Usage via Class Methods (data_trimmer module)

from preprocessor.data_trimmer.data_trimmer import DataTrimmer
# configure parameters (same variable names as command-line parameters)
class Conf:
    def __init__(self):
        self.input_file = "tests/data/test_input.csv"
conf = Conf()
# instance trimmer class and loads dataset
dt = DataTrimmer(conf)
# perform the module's core method
dt.core()
# save output to output file
dt.store()

Example: Usage via CLI (data_trimmer module)

data_trimmer --input_file "tests/data/test_input.csv"

About

Modular components for dataset preprocessing: a data-trimmer, a standarizer, a sliding-window generator and a feature selector, all optionally useable from command-line.

License:MIT License


Languages

Language:Python 95.9%Language:Shell 4.1%