A convolutional neural network (CNN) model that determines the country a photo was taken in.
The country-classifier
directory contains 4 files:
config.py
train.py
test.py
predict.py
To train a model, set the configuration variables in config.py
and run
train.py
. The training run data and the best model are saved in the
provided directory (runs
by default).
The trained model can be evaluated on the test set by providing the path to the
saved state_dict.pt
file in config.py
and running test.py
.
Run predict.py
after setting the relevant variables in config.py
to make
predictions with the model. The model outputs the top k
(5 by default) most
likely countries for the given image, as well as a
choropleth map of the
predicted probabilities for each country. The shapefiles used to generate the
map, from Natural Earth,
are located in the shapefiles
directory.
The Country211 dataset from OpenAI was used to train the model. The dataset contains 63,300 images from 211 countries, with 150 train images, 50 validation images, and 100 test images for each country. The provided splits were used.
A pretrained EfficientNet model was used as a base for transfer learning. Training a model from scratch was deemed infeasible, especially considering the small size of the dataset. 8 variations of EfficientNet models are available, increasing in size from EfficientNet-B0 to EfficientNet-B7. The largest model that could be trained without running out of memory on the available RTX 3080 10 GB GPU was EfficientNet-B2, so this was chosen for training.
Several transfer learning approaches were tried, including:
- Training all layers with a uniform learning rate
- Training all layers, but using different learning rates for the
features
andclassifier
parts of the model - Training only the
classifier
part of the model and freezing the other layers
The first approach produced the best results.
Early stopping was used to prevent overfitting, and the train set was augmented with horizontal flips. Additional augmentations were also tried but did not seem to improve the model's performance.
The state_dict.pt
file for the final model can be found in the models
directory.
The model achieved the following performance on the test set:
- Accuracy: 13.7%
- Top-5 accuracy: 29.6%
Unsurprisingly, the model performs best on pictures taken outside and those containing distinct natural features, compared to indoor photos or close-up images of objects. A curated set of images from the test set and the model's predictions for them are shown below.
Photo | Top-5 Predictions |
---|---|
1. Sweden: 4.7% 2. Slovenia: 4.7% 3. Slovakia: 4.3% 4. Latvia: 4.3% 5. Ukraine: 4.1% |
Prediction map |
---|
Photo | Top-5 Predictions |
---|---|
1. Kenya: 20.6% 2. Botswana: 18.9% 3. Tanzania: 18.0% 4. Eswatini: 8.9% 5. South Africa: 6.5% |
Prediction map |
---|
Photo | Top-5 Predictions |
---|---|
1. Hong Kong: 20.9% 2. Macao: 17.5% 3. Singapore: 9.9% 4. South Korea: 4.1% 5. Japan: 4.0% |
Prediction map |
---|
Photo | Top-5 Predictions |
---|---|
1. Antarctica: 52.4% 2. Greenland: 28.9% 3. Svalbard: 4.8% 4. Norway: 2.3% 5. Andorra: 1.6% |
Prediction map |
---|
The greatest limitation of the model seems to be the training data, both in terms of size and quality. The Country211 dataset only contains 150 train set images for each country, and many of the images fall into the following categories:
- Indoor photo not representative of the country
- Close-up photo of a ubiquitous object
- Digitally created (not a photograph)
- Incorrectly labeled (taken in a different country)
These images hinder the model's training, so the preparation of any new dataset should be done with care so that these are excluded. A model trained on such a dataset, preferably with a large number of images, will be able to achieve improved performance.