lpfann / squamish

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Welcome to squamish

License: Apache 2.0

This project contains a novel feature selection algorithm which performs feature classification and selection using a Random Forest classifier (LightGBM) and Boruta.

It classifies each data feature into the three classes (1) strong relevant features (2) weakly relevant features (3) irrelevant features.

A Publication detailing the methods used here is WIP.

The name is a codename without meaning and chosen because of personal reasons (Beautiful British Columbia...)

Install

We use poetry as our dependency manager and packaging tool.

poetry install

Run tests

poetry run pytest

Cite

@misc{pfannschmidt2020sequential,
    title={Sequential Feature Classification in the Context of Redundancies},
    author={Lukas Pfannschmidt and Barbara Hammer},
    year={2020},
    eprint={2004.00658},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Preprints can be found at https://pub.uni-bielefeld.de/record/2942271 or https://arxiv.org/abs/2004.00658. Experiments of the papers are located here.

About

License:Apache License 2.0


Languages

Language:Python 100.0%