remykarem / cattrees

Tree-based algorithms with categorical support

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[WIP] Trees, averaging trees and boosted trees from scratch

1. Usage

Prepare data. Here there are 3 features: the first 2 are numerical and the last is nominal.

>>> import numpy as np
>>> X = np.array([[  1,   1,   0],
                  [101, 101,   0],
                  [103, 103,   0],
                  [  3,   3,   0],
                  [  5,   5,   0],
                  [107, 107,   0],
                  [109, 109,   0],
                  [  7,   7,   1],
                  [  8,   8,   1]])
>>> y = np.array([0, 1, 1, 0, 0, 1, 1, 2, 2])

Import module

>>> from trees_and_forests import DecisionTreeClassifier

Initialise and fit data

>>> clf = DecisionTreeClassifier()
>>> clf.fit(X,y)

Inference

>>> clf.predict(np.array([[1,1,0]]))

2. Would-like-to-do-but-not-sure-when's

Algorithms

  • Decision tree classifier
  • Decision tree regressor
  • Simple bagging
  • Random forest
  • Extremely randomised trees
  • AdaBoost
  • Gradient boosting

Software development

  • Unit tests
  • API design document
  • Tutorial

Optimisations

  • Cythonise/PyTorchify
  • Performance against scikit-learn

3. Related

https://scikit-learn.org/stable/modules/tree.html

4. Resources

http://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/gradient_boosting.pdf https://scikit-learn.org

About

Tree-based algorithms with categorical support

License:MIT License


Languages

Language:Python 100.0%