This is my final project for the Masters 2 subject Fouille de Données et Aide à la Décision at Université Paris Diderot.
I use a Kaggle dataset with annotated bee images to train different models in order to classify a beehive
as healthy
or unhealthy
.
Data about beehive deseases was compressed into 2 labels: either healthy
(as it was before), indicating
there are no infections in the beehive, and unhealthy
, where the desease was mentioned in the original dataset.
5 images were removed because the quality was too low for SIFT to extract any descriptors.
- Bag of Visual Words
- Mini Batch K-Means
- K Nearest Neighbors
- Bernoulli Naive Bayes
- SVM
- Convolutional Neural Network
- 4133 training images
- 1034 test images
- Stratification with label ratio: 65% healthy and 35% unhealthy in each split
All hyper parameter choices were made using a validation set (10-fold cross validation for BOVW and 10% of the training set for the CNN)
---------- | Healthy | Unhealthy |
---|---|---|
Precision | 84% | 72% |
Recall | 86% | 69% |
---------- | Healthy | Unhealthy |
---|---|---|
Precision | 93% | 87% |
Recall | 93% | 88% |