MAIF / melusine

Description of Problem:
For the existing determinist neural networks, the predict_proba method gives a basic estimation of the probability per class.

melusine/melusine/models/train.py

Line 357 in 4a0a181

def predict_proba(self, X, **kwargs):

With a specific type of neural networks we are able to compute a better uncertainty estimation on the outputs of the models.
For users that give importance to uncertainty estimation (especially usefull for datasets with errors in labels), this type of model may give the same performance as deterministics neural nets but provides better uncertainty estimation.
The only drawback is we need to choose a prior on the weights of the neural net and it needs more computation to train.

Overview of the Solution:
Using the package tensorflow-probability we can setup a Neural Network to return a Distribution on the outputs (and not only point estimation).
For each prediction : this estimated distribution allows us to have :

A point estimation (mean of the distribution for example) : quiet the same as the existing predict_proba method
An estimation of uncertainty around this prediction (for examepl : using a standard deviation around with gaussian assumption)

Examples:
Using the tutorial of Melusine, instead of just having the point estimation with predict or predict_proba method, we can have upper bounds and lower bounds on the estimated probabilities.

In this example the category is "vehicle" and the model finds the good category with a good score but it also modulate the interval around this probability estimation. We can choose the level of confidence around this interval (in this example : 95% with gaussian assumption). This approach is very recommanded for critical process where uncertainty is key. It also can help us to find the error links to this mislabelling or more generally noise in the data.

Blockers:

Warning with the dependency tensorflow-probability. In my environment tf-probability is already here thanks to tensorflow. But we could make this dependency optionnal if someone doesn't want it in its environment.
The tf-probability version of cnn_model, rnn_model, transformers_model... will look very like the existing architectures. To be compatible with NeuralModel, I can just propose new functions that look exactly the same but with little modifications. If the architectures were cut in macro-blocks (embedding/Conv/RNN/Transformer/Outputs) then we could avoid the ctrl-c ctrl-v that I'm about to do.

Definition of Done:
Add new models in neural_architecture (like cnn_model but with tf-probability capabilities)

melusine/melusine/models/neural_architectures.py

Line 23 in 4a0a181

def cnn_model(

compatible with the existing NeuralModel class

melusine/melusine/models/train.py

Line 24 in 4a0a181

class NeuralModel(BaseEstimator, ClassifierMixin):

The predict_proba method of this new type of model will provide a better estimation of predicted probability and upper bounds / lower bounds.

I'm currently working on it. Happy to discuss about this topic.

Tensorflow-probability