chplushsieh / carvana-challenge

My Solution for Carvana Image Masking Challenge on Kaggle: https://www.kaggle.com/c/carvana-image-masking-challenge

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

carvana-challenge

Out of 737 teams, our solution for Carvana Image Masking Challenge on Kaggle ranked 9th place (top 1.2%) on Public Leaderboard and 31th place (top 4.2%) on Private Leaderboard. It is made by Chia-Hao Hsieh and Shao-Wen Lai.

Problem

Problem: remove background from car image

In this competition, you’re challenged to develop an algorithm that automatically removes the photo studio background.

Train Data

Train Data contains 5088 pairs of train image and label.

Test Data

There are 100064 test images.

Evaluation

This competition is evaluated on the mean Dice coefficient. The Dice coefficient can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth. The formula is given by: 2 * |X ∩ Y| / (|X|+|Y|) where X is the predicted set of pixels and Y is the ground truth. The Dice coefficient is defined to be 1 when both X and Y are empty. The leaderboard score is the mean of the Dice coefficients for each image in the test set.

Solution Overview

Our solution is an ensemble of 5 modified U-Net models using 1280x1280 image patch as input, along with test time augmentation. We used a combination loss function of soft DICE loss and Binary Cross Entropy loss. During training, we used data augmentations, including flipping, shifting, scaling, HSV color augmentation, and fancy PCA.

Training of one single model takes about 60-80 hours on a single GPU P5000 machine. Testing takes about 6-8 hours.

Our best performing single model

U-net image Blocks image

Result

Our best ensembled model scored 0.997191 mean Dice coefficient on Private Leaderboard and 0.996899 on Public Leaderboard.

Here are some results by our best single model:

Result image Result image Result image

Requirements

  • python 3.6
  • numpy
  • pytorch
  • pandas
  • pyyaml
  • crayon
  • scikit-image
  • pydensecrf
    • pip install cython and then pip install pydensecrf

Usage

Train/Test

  1. Extract data downloaded from Kaggle to ./data:

    data
    ├── metadata.csv
    ├── sample_submission.csv
    ├── test_hq
    │   ├── 0004d4463b50_01.jpg
    │   ├── 0004d4463b50_02.jpg
            ...
    │   └── 846faa0eb79f_04.jpg
    ├── train_hq
    │   ├── 00087a6bd4dc_01.jpg
    │   ├── 00087a6bd4dc_02.jpg
            ...
    │   └── fff9b3a5373f_16.jpg
    ├── train_masks
    │   ├── 00087a6bd4dc_01_mask.gif
    │   ├── 00087a6bd4dc_02_mask.gif
            ...
    │   └── fff9b3a5373f_16_mask.gif
    └── train_masks.csv
    

    Images are all of size 1918 x 1280

  2. Before training, start crayon by running docker run -d -p 8888:8888 -p 8889:8889 --name crayon alband/crayon

  3. Run python train.py

  4. Run python test.py <experiment_name>

    For example, run python test.py PeterUnet3_dropout

    ⚠️ Before you run test.py the first time, make sure you have at least 250GB free disk space to save prediction results.

  5. [Optional] Run python run_ensemble.py --pred_dirs <exp_output_dir_1> <exp_output_dir_2> ... <exp_output_dir_n>

    For example, run python run_ensemble.py --pred_dirs 0921-05:59:53 0921-06:00:00 0921-06:00:05 to ensemble three predictions

  6. Run python run_rle.py <exp_output_dir> to generate submission at ./output/<exp_output_dir>/submission.csv

  7. [Optional] Run python run_rle_ensemble.py --pred_dirs <exp_output_dir_1> <exp_output_dir_2> ... <exp_output_dir_n> to ensemble run-length encoded submission.csv files.

    For example, run python run_rle_ensemble.py --pred_dirs 0923-05:59:53 0921-06:00:00 to ensemble two predictions

Other Scripts

  • To find numbers that are divislbe by 2^n, run python scripts/divisble.py <start_number> <end_number>

    For instance, python scripts/divisble.py 900 1300

To-dos

  • load data
  • train/val split
  • prepare small dataset
  • implement run length encoding
  • add experiment config loading
  • try a simple UNet
  • visualize groundtruth and prediction
  • add DICE validation
  • add data augmentation: random horizontal flipping
  • add data augmentation: padding
  • try the original UNet
  • train/predict-by-tile
  • add DICE score during training
  • add validation loss and DICE score during training
  • add optimizer and loss to experiment setting in .yml
  • try modified UNet with UpSampling layers
  • improve tile.py: make it able to cut image into halves
  • complete util/tile.py: stitch_predictions()
  • complete util/submit.py
  • add data augmentation: random shift
  • add boundary weighted loss
  • experimenting with UNet parameters and architectures/modules
  • add CRF
    • it didn't help

About

My Solution for Carvana Image Masking Challenge on Kaggle: https://www.kaggle.com/c/carvana-image-masking-challenge

License:MIT License


Languages

Language:Python 100.0%