lannguyen0910 / cassava_leaf_disease

Kaggle Cassava Leaf Disease Classification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🌿 Cassvana Leaf Disease Classification

Kaggle Competition: https://www.kaggle.com/c/cassava-leaf-disease-classification

Dataset

  • Train set: ~26,000 images (21367 images of the 2020 contest was merged with 500 images from the 2019 contest).
  • Test set: ~15,000 images.
  • Public test: 31% of the test set.
  • Private test: 69% of the test set.

The dataset is imbalanced with 5 labels:

  1. Cassava Bacterial Blight (CBB)
  2. Cassava Brown Streak Disease (CBSD)
  3. Cassava Green Mottle (CGM)
  4. Cassava Mosaic Disease (CMD)
  5. Healthy
data
Original data images
transform
After albumentations

Requirements

Python >= 3.8. Run this command to install all the dependencies:

pip install -r requirements.txt

Directories Structures

  this repo
  └───  train_images                        
  β”‚     └───  ***.png                    # Dataset folder   
  └───  test_images                        
  β”‚     └───  ***.png              
  |
  └───  configs                 # Config folder                                          
  β”‚     └─── train.yaml
  β”‚     └─── test.yaml
  β”‚     └─── config.py
  |              
  └───  csv                   # labels folder               
  β”‚     └─── folds
  β”‚         └─── fold_train.csv
  β”‚         └─── fold_val.csv
  β”‚                     
  └───  loggers                    # experiments folder               
  β”‚     └─── runs
  β”‚         └─── loss_fold
  |         └─── acc_fold        
  └───  weights                    # experiments folder               
  β”‚     └─── model_name1.pth 
  |     └───  ...   
  |     
  |            
  train.py
  test.py

Edit YAML

Full explanation on each YAML file

Training Steps

  1. Download and unzip dataset from https://www.kaggle.com/c/cassava-leaf-disease-classification/data
  2. Run this command to split train.csv using KFold. A folder name 'csv' will be created with different k-folds
python utils/tools/split_kfold.py --csv=train.csv --seed=42 --out=csv --n_splits=5 --shuffle=True 
  1. Run this command on terminal or colaboratory (change if needed)
tensorboard --logdir='./loggers/runs'
  1. Run this command and fine-tune on parameters for fully train observation (Require change)
python train.py --config=config_name --resume=weight_path --print_per_iters=100 --gradcam_visualization
  1. The model weights will be saved automatically in the 'weights' folder

Inference

Run this command to generate predictions and submission file (Require fine-tune inside)

python test.py --config=test

Result

I have trained on Efficientnet-b6, EfficientNet-b1 and ViT. Here are the results:

  1. The result from Efficientnet-b6 is not quite good, accuracy just between 0.7-0.8 before Early Stopping.
  2. The result from Efficientnet-b1 and ViT are good enough: about 0.87x each.
  3. Some visualization for Mosaic Disease with GradCam

6103 | eff6103

218377 | eff218377

336550 | eff336550

469487 | eff469487

To-do list:

  • Multi-GPU support (nn.DataParallel)
  • GradCAM vizualization
  • Gradient Accumulation
  • Mixed precision
  • Stratified KFold splitting
  • Inference with Ensemble Model technique and TTA
  • Metrics: Accuracy, Balanced Accuracy, F1-Score
  • Losses: Focal Loss, SmoothCrossEntropy Loss
  • Optimizer: AdamW, SGD. Adas, SAM (not debug yet)
  • Scheduler: ReduceLROnPlateau, CosineAnnealingWarmRestarts
  • Usable Models: Vit, EfficientNet, Resnext, Densenet
  • Early Stopping on training

Reference:

Make sure to give them a star

About

Kaggle Cassava Leaf Disease Classification


Languages

Language:Python 100.0%