Allensmile / nannyml

Detecting silent model failure. NannyML estimates performance with an algorithm called Confidence-based Performance estimation (CBPE), developed by core contributors. It is the only open-source algorithm capable of fully capturing the impact of data drift on performance.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Documentation Status PyPI - License

NannyML - OSS Python library for detecting silent ML model failure | Product Hunt

Website โ€ข Docs โ€ข Community Slack


๐Ÿ’ก What is NannyML?

NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, interactive visualizations, is completely model-agnostic and currently supports all tabular classification use cases.

The core contributors of NannyML have researched and developed a novel algorithm for estimating model performance: confidence-based performance estimation (CBPE). The nansters also invented a new approach to detect multivariate data drift using PCA-based data reconstruction.

If you like what we are working on, be sure to become a Nanster yourself, join our community slack and support us with a GitHub star โญ.

โ˜” Why use NannyML?

NannyML closes the loop with performance monitoring and post deployment data science, empowering data scientist to quickly understand and automatically detect silent model failure. By using NannyML, data scientists can finally maintain complete visibility and trust in their deployed machine learning models. Allowing you to have the following benefits:

  • End sleepless nights caused by not knowing your model performance ๐Ÿ˜ด
  • Analyse data drift and model performance over time
  • Discover the root cause to why your models are not performing as expected
  • No alert fatigue! React only when necessary if model performance is impacted
  • Painless setup in any environment


NannyML Resources Description
โ˜Ž๏ธ NannyML 101 New to NannyML? Start here!
๐Ÿ”ฎ Performance estimation How the magic works.
๐ŸŒ Real world example Take a look at a real-world example of NannyML.
๐Ÿ”‘ Key concepts Glossary of key concepts we use.
๐Ÿ”ฌ Technical reference Monitor the performance of your ML models.
๐Ÿ”Ž Blog Thoughts on post-deployment data science from the NannyML team.
๐Ÿ“ฌ Newsletter All things post-deployment data science. Subscribe to see the latest papers and blogs.
๐Ÿ’Ž New in v0.4.0 New features, bug fixes.
๐Ÿง‘โ€๐Ÿ’ป Contribute How to contribute to the NannyML project and codebase.
Join slack Need help with your specific use case? Say hi on slack!

๐Ÿ”ฑ Features

1. Performance estimation and monitoring

When the actual outcome of your deployed prediction models is delayed, or even when post-deployment target labels are completely absent, you can use NannyML's CBPE-algorithm to estimate model performance. This algorithm requires the predicted probabilities of your machine learning model and leverages probability calibration to estimate any traditional binary classification metric (ROC_AUC, Precision, Recall, F1, etc.). Rather than estimating the performance of future model predictions, CBPE estimates the expected model performance of the predictions made at inference time.

NannyML can also track the realised performance of your machine learning model once targets are available.

2. Data drift detection

To detect multivariate feature drift NannyML uses PCA-based data reconstruction. Changes in the resulting reconstruction error are monitored over time and data drift alerts are logged when the reconstruction error in a certain period exceeds a threshold. This threshold is calculated based on the reconstruction error observed in the reference period.

NannyML utilises statistical tests to detect univariate feature drift. The Kolmogorovโ€“Smirnov test is used for continuous features and the 2-sample chi-squared test for categorical features. The results of these tests are tracked over time, properly corrected to counteract multiplicity and overlayed on the temporal feature distributions. (It is also possible to visualise the test-statistics over time, to get a notion of the drift magnitude.)

NannyML uses the same statistical tests to detected model output drift.

Target distribution drift is monitored by calculating the mean occurrence of positive events in combination with the 2-sample chi-squared test. Bear in mind that this operation requires the presence of actuals.

3. Intelligent alerting

Because NannyML can estimate performance, it is possible to weed out data drift alerts that do not impact expected performance, combatting alert fatigue. Besides linking data drift issues to drops in performance it is also possible to prioritise alerts according to other criteria using NannyML's Ranker.

๐Ÿš€ Getting started

Install NannyML

From PyPI:

pip install nannyml

Here be dragons! Use the latest development version of NannyML at your own risk:

python -m pip install git+

Quick Start

The following snippet is based on our latest release.

import pandas as pd
import nannyml as nml

# Load dummy data
reference, analysis, analysis_target = nml.load_synthetic_binary_classification_dataset()
data = pd.concat([reference, analysis], ignore_index=True)

# Extract meta data
metadata = nml.extract_metadata(data = reference, model_type='classification_binary', exclude_columns=['identifier'])
metadata.target_column_name = 'work_home_actual'

# Choose a chunker or set a chunk size
chunk_size = 5000

# Estimate model performance
estimator = nml.CBPE(model_metadata=metadata, metrics=['roc_auc'], chunk_size=chunk_size)
estimated_performance = estimator.estimate(data=data)

figure = estimated_performance.plot(metric='roc_auc', kind='performance')

# Detect multivariate feature drift
multivariate_calculator = nml.DataReconstructionDriftCalculator(model_metadata=metadata, chunk_size=chunk_size)
multivariate_results = multivariate_calculator.calculate(data=data)

figure = multivariate_results.plot(kind='drift')

# Detect univariate feature drift
univariate_calculator = nml.UnivariateStatisticalDriftCalculator(model_metadata=metadata, chunk_size=chunk_size)
univariate_results = univariate_calculator.calculate(data=data)

# Rank features based on number of alerts
ranker ='alert_count')
ranked_features = ranker.rank(univariate_results, model_metadata=metadata, only_drifting = False)

for feature in ranked_features.feature:
    figure = univariate_results.plot(kind='feature_distribution', feature_label=feature)

๐Ÿ“– Documentation

๐Ÿฆธ Contributing and Community

We want to build NannyML together with the community! The easiest to contribute at the moment is to propose new features or log bugs under issues. For more information, have a look at how to contribute.

๐Ÿ™‹ Get help

The best place to ask for help is in the community slack. Feel free to join and ask questions or raise issues. Someone will definitely respond to you.

๐Ÿฅท Stay updated

If you want to stay up to date with recent changes to the NannyML library, you can subscribe to our release notes. For thoughts on post-deployment data science from the NannyML team, feel free to visit our blog. You can also sing up for our newsletter, which brings together the best papers, articles, news, and open-source libraries highlighting the ML challenges after deployment.

๐Ÿ“„ License

NannyML is distributed under an Apache License Version 2.0. A complete version can be found here. All contributions will be distributed under this license.


Detecting silent model failure. NannyML estimates performance with an algorithm called Confidence-based Performance estimation (CBPE), developed by core contributors. It is the only open-source algorithm capable of fully capturing the impact of data drift on performance.

License:Apache License 2.0


Language:Python 96.4%Language:Jupyter Notebook 3.3%Language:Makefile 0.3%