In this project neural networks will be used to help predicting where to invest as a foundation. With the help of Machine Learning and Neural Networks we will create a binary classifier that is capable of predicting whether applicants will be successful if funded by the foundation (Alphabet Soup).
The data will be taken from a .csv file containing more than 34k organizations that have received funding in the past.
The process of completing this project is as follows:
- Preprocessing Data for a Neural Network Model
- Compile, Train, and Evaluate the Model
- Optimizing the Model
-
Data Preprocessing
- Variables considered as targets: Column IS_SUCCESSFUL contains binary data pertaining whether or not the charity donation was used effectively.
- Variables considered as features: Columns APPLICATION_TYPE, AFFILIATION, CLASSIFICATION, USE_CASE, ORGANIZATION, STATUS, INCOME_AMT, SPECIAL_CONSIDERATIONS, ASK_AMT.
- Variables that should be removed: Columns EIN and NAME are considered identification information.
-
Compiling, Training, and Evaluating the Model
-
Number of Neurons, Layers and Activation Functions selected for the model:
Hidden Layers:
- #1: 80 neurons.
- #2: 30 neurons.
Input Data:
- 43 features
- 25k samples
Output Layer:
- 1 neuron.
Activation Functions:
- ReLU: for speeding up the training process (hidden layers).
- Sigmoid: considering a binary classification (output layer).
- adam: optimizer for compilation.
- binary_crossentropy: loss function for compilation.
-
-
Target Model Performance
The goal was to meet or excede the 75% model accuracy, but this was not achieved obtaining numbers as close as 72%.
- Steps taken to increase model performance
- Bucketing to the feature ASK_AMT and organizing the values by intervals.
- Increasing the number of neurons in a single hidden layer, then using a model containing 3 hidden layers.
- Trying the tanh activation function.
It was not a satisfactory result taking into account that 75% is a standard to measure the accuracy of a model and it was not achieved by this example. In the future a Random Forest Classifier could be tested to try and achieve better results considering that it is a binary classification output.