This repository is intended to enable quick access to datasets for predictive maintenance (PM) tasks (under development). The following table summarizes the available features, where the mark * on dataset names shows the richness of attributes you may check them up with higher priority. Note that RUL means remaining useful life.
Timestamp | #Sensor | #Alarm | RUL | License | |
---|---|---|---|---|---|
ALPI* | x | 140 | CC-BY | ||
CBM | x | 15 | 3 | Other | |
CMAPSS | x | 26 | 2-6 | x | CC0: Public Domain |
GDD | x | 5(1) | 3 | CC-BY-NC-SA | |
GFD | x | 4 | 2 | CC-BY-SA | |
HydSys* | x | 17 | 2-4 | Other | |
MAPM* | x | 4 | 5 | x | Other |
PPD | x | x | x | CC-BY-SA | |
UFD | 37-52 | 4 | Other |
- Python=3.7
- pandas=1.1.2
Please put datasets
directory into your workspace and import it like:
import datasets
# Dataset-specific values will be returned
datasets.ufd.load_data()
# A visualization pdf will be generated
datasets.ufd.gen_summary()
Each dataset class has the following functions:
load_data(index)
:
Dataset loading specified by 'index'. Please see README.md in each dataset directory for more details.gen_summary(outdir)
:
PDF file generation for full dataset visualization.
Run-to-Falure data require:
- time column
- event/cencoring column (categorical)
- numerical/categorical feature columns (optional)
There are Jupyter notebooks for all datasets, which may help interactive data processing and visualization.
- Wikipedia:
https://en.wikipedia.org/wiki/Predictive_maintenance - Azure AI guide for predictive maintenance solutions:
https://docs.microsoft.com/en-us/azure/architecture/data-science-process/predictive-maintenance-playbook - Open source python package for Survival Analysis modeling:
https://square.github.io/pysurvival/index.html - Types of proactive maintenance:
https://solutions.borderstates.com/types-of-proactive-maintenance/ - Common license types for datasets:
https://www.kaggle.com/general/116302
- ALPI: Diego Tosato, Davide Dalle Pezze, Chiara Masiero, Gian Antonio Susto, Alessandro Beghi, 2020. Alarm Logs in Packaging Industry (ALPI).
https://dx.doi.org/10.21227/nfv6-k750 - CBM: Condition Based Maintenance of Naval Propulsion Plants Data Set
http://archive.ics.uci.edu/ml/datasets/condition+based+maintenance+of+naval+propulsion+plants - CMAPSS: NASA Turbofan Jet Engine Data Set:
https://www.kaggle.com/behrad3d/nasa-cmaps - GDD: Genesis demonstrator data for machine learning:
https://www.kaggle.com/inIT-OWL/genesis-demonstrator-data-for-machine-learning - GFD: Gearbox Fault Diagnosis:
https://www.kaggle.com/brjapon/gearbox-fault-diagnosis - HydSys: Predictive Maintenance Of Hydraulics System:
https://archive.ics.uci.edu/ml/datasets/Condition+monitoring+of+hydraulic+systems - MAPM: Microsoft Azure Predictive Maintenance:
https://www.kaggle.com/arnabbiswas1/microsoft-azure-predictive-maintenance - PPD: Production Plant Data for Condition Monitoring:
https://www.kaggle.com/inIT-OWL/production-plant-data-for-condition-monitoring - UFD: Ultrasonic flowmeter diagnostics Data Set:
https://archive.ics.uci.edu/ml/datasets/Ultrasonic+flowmeter+diagnostics
- Birkl, Christoph. Oxford Battery Degradation Dataset 1. University of Oxford, 2017.
https://ora.ox.ac.uk/objects/uuid:03ba4b01-cfed-46d3-9b1a-7d4a7bdf6fac - Lu, Jiahuan; Xiong, Rui; Tian, Jinpeng; Wang, Chenxu; Hsu, Chia-Wei; Tsou, Nien-Ti; Sun, Fengchun; Li, Ju (2021), “Battery Degradation Dataset (Fixed Current Profiles&Arbitrary Uses Profiles)”, Mendeley Data, V2.
https://data.mendeley.com/datasets/kw34hhw7xg/2 - One Year Industrial Component Degradation
https://www.kaggle.com/inIT-OWL/one-year-industrial-component-degradation - Vega shrink-wrapper component degradation
https://www.kaggle.com/inIT-OWL/vega-shrinkwrapper-runtofailure-data - NASA Bearing Dataset:
https://www.kaggle.com/vinayak123tyagi/bearing-dataset - CWRU Bearing Dataset:
https://www.kaggle.com/brjapon/cwru-bearing-datasets
All the matrials except for datasets is available under MIT lincense. I preserve all raw data but atatch data loading and preprocessing tools to each dataset directory so that they are quickly used in Python. Each dataset should be used under its own lincense.