Decision Tree utilizing Fuzzy Logic
The dectree tool reads the definition of a decision tree and outputs a Python function that implements it. The generated Python code is highly optimized through the use of Numba. The function can be generated for inputs/outputs whose values are either scalars or vectors, usually given as Numpy-like arrays.
The decision tree definition comprises the description inputs, outputs, and a set of rules. The rules are if/else-statements whose conditional expressions compare one or more input variables with the individual properties defined for a given variable type. In the fuzzy logic literature, each property defines a fuzzy set and an associated membership function is used to determine the membership (truth value in the range 0 to 1) of given value of a linguistic variable with respect to that property. The if/else-bodies can again contain other rules or contain output variable assignments. Again, only properties can be assigned to output variables. In contrast to the properties for input variables, output properties must deliver constant truth values in the range 0 to 1.
The type for a variable x
may comprise the properties HIGH
, MIDDLE
, LOW
while the type for y
may define FAST
, and SLOW
.
A property's membership function maps a variable value (a floating point number of known range) to a fuzzy truth value (in the range of zero to one) which is a measure for the membership of the value to that property. Logical expressions that combine these truth values translate as follows with a and b being fuzzy truth values:
- a and b --> min(a, b)
- a or b --> max(a, b)
- not a --> 1 - a
For example, the condition x is HIGH and y is not SLOW
"fuzzifies" to min(HIGH(x), 1 - SLOW(y))
.
Available membership functions are
ramp(x1=0, x2=1)
a ramp with positive slope for x in the range x1 to x2, 0 if x < x1, and 1 if x > x2. The inverse isinv_ramp()
with same parameters.triangular(x1=0, x2=0, x3=1)
- a ramp with positive slope for x in the range x1 to x2, a ramp with negative slope in the range x2 to x3, 0 if x < x1 or x > x3. The inverse isinv_triangular()
with same parameters.trapezoid(x1=0, x2=1/3, x3=2/3, x4=1)
- a ramp with positive slope for x in the range x1 to x2, a ramp with negative slope in the range x2 to x3, 0 if x < x1 or x > x3. The inverse isinv_trapezoid()
with same parameters.eq(x0, dx=0)
,ne(x0, dx=0)
,lt(x0, dx=0)
,le(x0, dx=0)
,gt(x0, dx=0)
,ge(x0, dx=0)
with which all yield fuzzy values in the range if x > x0 - dx and x < x0 + dx, except x is exactly x0.
The following are not really membership functions as they return constant truth values:
true()
- always 1;false()
- always 0;const(t)
- alwayst
.
The membership functions are defined in dectree/propfuncs.py.
The only membership functions allowed for properties assigned to output values are true()
, false()
, or const(t)
.
$ dectree -h
$ dectree examples/im_classif.yml -o . --vectorize
See also related notebook examples/im_classif.ipynb.
Command-line tool:
- Python 3.3+
- pyyaml
To run the generated Python modules and to run the dectree unit-tests:
- numba
- numpy
$ git clone https://github.com/forman/dectree.git
$ cd dectree
$ python setup.py develop
A rule has the general form
if <CONDITION>:
<BODY>
or
if <CONDITION>:
<BODY_1>
else:
<BODY_2>
where <BODY>
may be another nested rule or comprise a list of one or more output variable
assignments of the form
<OUTPUT> = <PROPERTY>
where <OUTPUT>
is the name of any defined output and <PROPERTY>
is the name of a property
defined for the output type. The value of output properties Currently, the only membership functions supported for outputs
are the ones that do not depend on the output value: true()
, false()
, and const(t)
.
The final value of an output variable in the if-body <BODY_1>
of a rule is computed by the minimum
of the current truth value given by <CONDITION>
and the constant value returned by the membership function
of the assigned output property.
Likewise, the final value of an output variable in the else-body <BODY_2>
of a rule is computed
by the minimum of the negation of current truth value given by <CONDITION>
and the
constant value returned by the membership function of the assigned output property.
The rule's <CONDITION>
is a conditional expression comprising comparisons of the form
<INPUT> is <PROPERTY>
which can be combined using the logical and
, or
,
and not
operators having the common precedences. Parentheses can be used to control
expression precedences. A conditional expression <CONDITION>
is translated by a function
translate_expr()
as follows:
<INPUT> is <PROPERTY>
or also<INPUT> == <PROPERTY>
translates into a function call<TYPE>_<PROPERTY>(<INPUT>)
that computes the truth value of<INPUT>
with respect to the given property<PROPERTY>
defined for type<TYPE>
.not <CONDITION>
translates into1.0 - translate_expr(<CONDITION>)
<CONDITION_1> and <CONDITION_2>
translates intomin(translate_expr(<CONDITION_1>), translate_expr(<CONDITION_2>))
<CONDITION_1> or <CONDITION_2>
translates intomax(translate_expr(<CONDITION_1>, translate_expr(<CONDITION_2>))
A simple rule of the form
if <CONDITION>:
<OUTPUT_1> = <VALUE_1>
else:
<OUTPUT_2> = <VALUE_2>
will translate into
t0 = 1.0
# if <CONDITION>:
t1 = min(t0, translate_expr(<CONDITION>))
# <OUTPUT_1> = <VALUE_1>
<OUTPUT_1> = min(t1, <VALUE_1>)
# else:
t1 = min(t0, 1.0 - t1)
# <OUTPUT_2> = <VALUE_2>
<OUTPUT_2> = min(t1, <VALUE_2>)
The following example has nested rules and the output <OUTPUT_2>
is assigned twice.
if <COND_1>:
if <COND_2>:
if <COND_3>:
<OUTPUT_1> = <VALUE_1>
<OUTPUT_2> = <VALUE_2>
else:
if <COND_4>:
<OUTPUT_3> = <VALUE_3>
else:
if <COND_5>:
<OUTPUT_4> = <VALUE_4>
else:
<OUTPUT_2> = <VALUE_2>
Multiple assignments to the same variable are interpreted as alternatives,
thus the maximum value of all possible values for <OUTPUT_2>
is taken
(see last line). The translation is
t0 = 1.0
# if <COND_1>:
t1 = min(t0, translate_expr(<COND_1>))
# if <COND_2>:
t2 = min(t1, translate_expr(<COND_2>))
# if <COND_3>:
t3 = min(t2, translate_expr(<COND_3>))
# <OUTPUT_1> = <VALUE_1>
<OUTPUT_1> = min(t3, <VALUE_1>)
# <OUTPUT_2> = <VALUE_2>
<OUTPUT_2> = min(t3, <VALUE_2>)
# else:
t1 = min(t0, 1.0 - t1)
# if <COND_4>:
t2 = min(t1, translate_expr(<COND_4>))
# <OUTPUT_3> = <VALUE_3>
<OUTPUT_3> = min(t2, <VALUE_3>)
# else:
t2 = min(t1, 1.0 - t2)
# if <COND_5>:
t3 = min(t2, translate_expr(<COND_5>))
# <OUTPUT_4> = <VALUE_4>
<OUTPUT_4> = min(t3, <VALUE_4>)
# else:
t3 = min(t2, 1.0 - t3)
# <OUTPUT_1> = <VALUE_1>
<OUTPUT_1> = max(<OUTPUT_1>, min(t3, <VALUE_1>))
See decision tree definition files in examples: