Predicting Machine Failure

Photo by Jonathan Borba on Unsplash

Project Overview

A single machine failure can grind an assembly line to a halt. If we can predict machine failures, then we can speed up the time to recovery.

Navigating Repository:

Exploratory work: Contains exploratory data analysis from all members
Images: Visualization of graphs derived from the prediction
Solutions.ipynb: Finalized notebook with the predictive analysis and algorithm
README: Includes project overview, analysis, and recommendations
Presentation: Includes final PowerPoint presentation

Business Understanding

Problem: Machines on an automotive assembly lines are interdependent upon each other in order to produce a complete automobile. If a machine in this line fails, it would delay the completion of the next steps in the assembly process, or it may produce defective parts increasing waste. Given sensor data, is it possible to detect a machine failure?
Stakeholders: Auto manufacturers
Solution: Using supervised machine learning, we attempt to predict manufacturing machine failures. This gives lead time for maintenance or replacement of the faulty machine.

Data Understanding and Analysis

Understanding sources of data

Machine Failure Prediction

A CSV file is obtained from Kaggle and within the file is a large repository of machine failure data including: Machine type, Air temperature, Process temperature, Roational speed, Torque, Tool wear (min), machine failure (binary), TWF(binary), HDF(binary), PWF(binary), OSF(binary), and RNF(binary).

Data Analysis Process

Collection: The data collection process is initiated by retrieving a csv file contained within kaggle.
Cleaning: The initial dataset was imbalanced, contained multiple failure types, and contained binary, categorical, and continuous data on vastly different scales. Resampling, scaling, binary transformation, and consolidation techniques were utilized to clean the data prior to processing.
Processing: A host of different built-in python functions and specialized libraries were utilized with sklearn being the most utilized library for processing our data.
Analysis: Employing visualization packages such as Matplotlib and Seaborn, we craft vivid representations of our models performance metrics.

Model Evaluation

Untuned model accuracy

Tuned model accuracy

Conclusion

The recommendation for the top model in predicting machine failure will be Random Forest for these reasons:

Decrease downtime
Enhanced Operational performance
Reduce maintenance cost
Increase product quality
Protect brand reputation

dseo23 / capstone_project_2