villaume / feature_generation

Prototyping Feature Generation Tools for Spark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

feature_generation

Prototying Feature Generation Tools for Spark. (lazy work in progress to make a lame scala joke)

Includes

• VIF Calculation to find and flag co-linear features in a Spark DataFrame • Top-X One-Hot Encoding of String Array Features

Environment

Uses Spark 2.1.1 Scala 2.11

To-Dos

• Finish integration test (see about getting smaller docker images) • Add random forest feature selection • Add forward/backward inclusion • Add Information Value / workspace_id • Add DF evalutation (fill-rates etc)

About

Prototyping Feature Generation Tools for Spark

License:MIT License


Languages

Language:Scala 97.5%Language:Shell 2.5%