Neural Network And Geoscience
The objective of this study is to use Neural Network`s with PyTorch to classify over 12 classes a rock type.
Types of Rock
Based on column FORCE_2020_LITHOFACIES_LITHOLOGY
• 30000: Sandstone
• 65030: Sandstone/Shale
• 65000: Shale
• 80000: Marl
• 74000: Dolomite
• 70000: Limestone
• 70032: Chalk
• 88000: Halite
• 86000: Anhydrite
• 99000: Tuff
• 90000: Coal
• 93000: Basement
How to Execute the project
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Study
The study is dividided in:
- Analyse data
- Baseline With Decision Tree Classifier
- Different Neural Network Architectures test
- Generate predictions of a hidden file
Analyse Data and Baseline With Decision Tree Classifier
The analysis and Decision Tree can be found on Base_Analyze.ipynb
. This file load data and analyse based on Decision Tree algorithm with unbalanced and balanced Data.
In Unbalanced data, the result was 92.35%
and with balanced data with SMOTE (Sinthetic Minotiry Oversampling Technique) was 98.88%
.
Different Neural Network Architectures test
The table below describe the tests made with Neural Network. The objective was to start with just few data and add more or change model according to the results.
Number | Model Name | Max Num Samples Per Class | SMOTE? | Num Epochs | Train Acc | Train Loss | Val Acc | Val Loss | Test Precision | Test Recall | Test F1-Score |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | single_layer | 10000 | False | 100 | 77.390% | 0.65312 | 74.709% | 0.68731 | 0.79 | 0.75 | 0.76 |
2 | multi_layer | 10000 | False | 100 | 72.729% | 0.84311 | 72.398% | 0.75672 | 0.76 | 0.72 | 0.73 |
3 | multi_layer | 50000 | False | 1000 | 59.464% | 1.14423 | 64.466% | 1.02543 | 0.67 | 0.64 | 0.63 |
4 | multi_layer_relu | 10000 | False | 100 | 88.023% | 0.39305 | 87.399% | 0.40415 | 0.88 | 0.87 | 0.87 |
5 | multi_layer_relu_batch_norm | 10000 | False | 100 | 92.029% | 0.21409 | 91.749% | 0.19964 | 0.92 | 0.92 | 0.92 |
6 | multi_layer_relu_batch_norm | 10000 | True | 100 | 93.801% | 0.17439 | 94.958% | 0.13612 | 0.95 | 0.95 | 0.95 |
7 | multi_layer_3_relu_batch_norm_dropout | 20000 | True | 300 | 69.476% | 0.75197 | 72.979% | 0.60935 | 0.74 | 0.73 | 0.72 |
Table Definition
-
Number: Test Number
-
Model Name: Name of the model
-
Max Num Samples Per Class: The maximum number of classes per class for all, train/valid/test
-
SMOTE? Boolean that indicates if the test used augmented data or not
-
Num Epochs: Number of epochs
-
Train Acc: Last epoch train accuracy
-
Train Loss: Last epoch train loss
-
Val Acc: Last epoch valid accuracy
-
Val Loss: Last epoch valid loss
-
Test Precision: Weighted test precision
-
Test Recall: Weighted test recall
-
Test F1-Score: Weighted test F1-Score
Important: Some of models were exectued more than once and some values may vary a little.
Generate predictions of a hidden file
As the result above, we save 2 models in data/model
folder:
- Test 5 with 92% of test precision (unbalanced data)
- Test 6 with 95% of test precision (balanced data)
The inferences were executed in Inference_Model.ipynb
file. The result generates are in data/output
folder.
Conclusion
In this homework I could learn more about PyTorch, unbalanced and balanced data and multiple kind of test. Unfortunatelly, for a higher results, more data should be added in some classes to balance and with more analysis on each variable, some of them could be removed (or outliers and others).