MAIF / melusine

📧 Melusine: Use python to automatize your email processing workflow

Home Page:https://maif.github.io/melusine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tensorflow-probability

benoitLebreton-perso opened this issue · comments

Description of Problem:
For the existing determinist neural networks, the predict_proba method gives a basic estimation of the probability per class.

def predict_proba(self, X, **kwargs):

With a specific type of neural networks we are able to compute a better uncertainty estimation on the outputs of the models.
For users that give importance to uncertainty estimation (especially usefull for datasets with errors in labels), this type of model may give the same performance as deterministics neural nets but provides better uncertainty estimation.
The only drawback is we need to choose a prior on the weights of the neural net and it needs more computation to train.

Overview of the Solution:
Using the package tensorflow-probability we can setup a Neural Network to return a Distribution on the outputs (and not only point estimation).
For each prediction : this estimated distribution allows us to have :

  • A point estimation (mean of the distribution for example) : quiet the same as the existing predict_proba method
  • An estimation of uncertainty around this prediction (for examepl : using a standard deviation around with gaussian assumption)

Examples:
Using the tutorial of Melusine, instead of just having the point estimation with predict or predict_proba method, we can have upper bounds and lower bounds on the estimated probabilities.

In this example the category is "vehicle" and the model finds the good category with a good score but it also modulate the interval around this probability estimation. We can choose the level of confidence around this interval (in this example : 95% with gaussian assumption). This approach is very recommanded for critical process where uncertainty is key. It also can help us to find the error links to this mislabelling or more generally noise in the data.

Blockers:

  • Warning with the dependency tensorflow-probability. In my environment tf-probability is already here thanks to tensorflow. But we could make this dependency optionnal if someone doesn't want it in its environment.
  • The tf-probability version of cnn_model, rnn_model, transformers_model... will look very like the existing architectures. To be compatible with NeuralModel, I can just propose new functions that look exactly the same but with little modifications. If the architectures were cut in macro-blocks (embedding/Conv/RNN/Transformer/Outputs) then we could avoid the ctrl-c ctrl-v that I'm about to do.

Definition of Done:
Add new models in neural_architecture (like cnn_model but with tf-probability capabilities)

compatible with the existing NeuralModel class
class NeuralModel(BaseEstimator, ClassifierMixin):

The predict_proba method of this new type of model will provide a better estimation of predicted probability and upper bounds / lower bounds.

I'm currently working on it. Happy to discuss about this topic.