KDercksen / hunter2_dbi

Dog Breed Identification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data exploration

KDercksen opened this issue · comments

In order to improve on our simple initial approach, some deeper knowledge of the data is advantageous. We should try to find out about interesting properties of the data; we know that the dataset is fairly balanced, which is good, but there might be breeds that the model has trouble with. We can find out which breeds this is applicable to, and how to fix it.

Suggestions of techniques to use: pandas is great for exploring the data on a low or high level. Use a notebook (there is already one in the repository) to do some plotting, looking at the confusion matrix etc.

Additional discussion can go below in the comments!

Find out classes with bad predictions, see what makes them special, and think of ways to solve that.