Some quick links:
PSyKE (Platform for Symbolic Knowledge Extraction) is intended as a library for extracting symbolic knowledge (in the form of logic rules) out of sub-symbolic predictors.
More precisely, PSyKE offers a general purpose API for knowledge extraction, and a number of different algorithms implementing it, supporting both classification and regression problems. The extracted knowledge consists of a Prolog theory (i.e., a list of Horn clauses) or an OWL ontology containing SWRL rules.
PSyKE relies on 2ppy (tuProlog in Python) for logic support, which in turn is based on the 2p-Kt logic ecosystem.
PSyKE is designed around the notion of extractor.
More precisely, an Extractor
is any object capable of extracting a logic Theory
out of a trained sub-symbolic regressor or classifier.
Accordingly, an Extractor
is composed of
(i) a trained predictor (i.e., black-box used as an oracle) and
(ii) a set of feature descriptors, and it provides two methods:
extract
: returns a logic theory given a dataset;predict
: predicts a value using the extracted rules (instead of the original predictor).
Currently, the supported extraction algorithms are:
- CART, straightforward extracts rules from both classification and regression decision trees;
- Classification:
- Regression:
- ITER, builds and iteratively expands hypercubes in the input space. Each cube holds a constant value, that is the estimated output for the samples inside the cube;
- GridEx, extension of the ITER algorithm that produces shorter rule lists retaining higher fidelity w.r.t. the predictor.
- GridREx, extension of GridEx where the output of each hypercube is a linear combination of the input variables and not a constant value.
Users may exploit the PEDRO algorithm, included in PSyKE, to tune the optimal values for GridEx and GridREx hyper-parameters.
We are working on PSyKE to extend its features to encompass explainable clustering tasks, as well as to make more general-purpose the supported extraction algorithms (e.g., by adding classification support to GridEx and GridREx).
PSyKE is deployed as a library on Pypi, and it can therefore be installed as Python package by running:
pip install psyke
numpy
pandas
scikit-learn
2ppy
skl2onnx
onnxruntime
parameterized
Once installed, it is possible to create an extractor from a predictor (e.g. Neural Network, Support Vector Machine, K-Nearest Neighbor, Random Forest, etc.) and from the dataset used to train the predictor.
Note: the predictor must expose a method named
predict
to be properly used as an oracle.
A brief example is presented in demo.py
script in the demo/
folder.
Using sklearn
's Iris dataset we train a K-Nearest Neighbor to predict the correct output class.
Before training, we make the dataset discrete.
After that we create two different extractors: REAL and Trepan.
We output the extracted theory for both extractors.
REAL extracted rules:
iris(PetalLength, PetalWidth, SepalLength, SepalWidth, setosa) :- PetalWidth =< 1.0.
iris(PetalLength1, PetalWidth1, SepalLength1, SepalWidth1, versicolor) :- PetalLength1 > 4.9, SepalWidth1 in [2.9, 3.2].
iris(PetalLength2, PetalWidth2, SepalLength2, SepalWidth2, versicolor) :- PetalWidth2 > 1.6.
iris(PetalLength3, PetalWidth3, SepalLength3, SepalWidth3, virginica) :- SepalWidth3 =< 2.9.
iris(PetalLength4, PetalWidth4, SepalLength4, SepalWidth4, virginica) :- SepalLength4 in [5.4, 6.3].
iris(PetalLength5, PetalWidth5, SepalLength5, SepalWidth5, virginica) :- PetalWidth5 in [1.0, 1.6].
Trepan extracted rules:
iris(PetalLength6, PetalWidth6, SepalLength6, SepalWidth6, virginica) :- PetalLength6 > 3.0, PetalLength6 in [3.0, 4.9].
iris(PetalLength7, PetalWidth7, SepalLength7, SepalWidth7, versicolor) :- PetalLength7 > 3.0.
iris(PetalLength8, PetalWidth8, SepalLength8, SepalWidth8, setosa) :- true.
Working with PSyKE codebase requires a number of tools to be installed:
-
Python 3.9
- Python version greater than
3.9.x
are currently not supported
- Python version greater than
-
JDK 11+ (please ensure the
JAVA_HOME
environment variable is properly configured) -
Git 2.20+
To participate in the development of PSyKE, we suggest the PyCharm IDE.
- Clone this repository in a folder of your preference using
git_clone
appropriately - Open PyCharm
- Select
Open
- Navigate your file system and find the folder where you cloned the repository
- Click
Open
Contributions to this project are welcome. Just some rules:
- We use git flow, so if you write new features, please do so in a separate
feature/
branch - We recommend forking the project, developing your code, then contributing back via pull request
- Commit often
- Stay in sync with the
develop
(ormaster
) branch (pull frequently if the build passes) - Do not introduce low quality or untested code
If you meet some problems in using or developing PSyKE, you are encouraged to signal it through the project "Issues" section on GitHub.