omkaracharya / Yelp-Restaurant-Photo-Classification

Kaggle Competition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CSC591 Capstone Project - Yelp Restaurant Photo Classification

Team Members:

Description:

In this project, we build a model that automatically tags restaurants with multiple labels using a dataset of user-submitted photos. Currently, restaurant labels are manually selected by Yelp users when they submit a review. Selecting the labels is optional, leaving some restaurants un- or only partially-categorized. In an age of food selfies and photo-centric social storytelling, it may be no surprise to hear that Yelp's users upload an enormous amount of photos every day alongside their written reviews.

You must have ..

  • Numpy - For handling the datasets (pip install numpy)
  • Pandas - For handling the datasets (pip install pandas)
  • Scikit Learn - To use classification algorithms like SVM (pip install -U scikit-learn)
  • Python

The following dependencies are only required if you wish to extract image and business features from scratch. But we have already done that for you, you just need to download them from the links provided below in the table. Make sure that you put these files in "features" directory.

  • H5Py - To store the features extracted from CNN (pip install h5py)
  • Caffe - To extract features from the images (Refer to the link)

Folder description:

  • code/ - contains programs to extract features and perform the final classification.
  • data/ - contains training and testing images + metadata from Yelp dataset (We have already extracted and stored the features for east of project execution).
  • features/ - contains the extracted features from images and restaurants (For ease of project execution).
  • models/ - contains trained SVM model which can be used for future predictions without retraining (Will be generated automatically when classify.py is run for the first time; for ease of project execution, we have included this model as well).

Dataset:

Again if you choose to extract image and business features from scratch, you will need this dataset. It is available here. Dataset description is also available. Download and extract the files/folders in the "data" directory.

For ease of project execution, we have already extracted the features and stored in the following files:

Filename Size Description Command that was used for generation
train_features.h5 3.59 GB Format: [PhotoId, ImageFeatures] This file contains ImageNet features of training dataset python extract_image_features_train.py
test_features.h5 18.2 GB Format: [PhotoId, ImageFeatures] This file contains ImageNet features of test dataset python extract_image_features_test.py
train_business_features.csv 91.7 MB Format: [BusinessId, BusinessFeatures, ClassLabels] This file contains features extracted for businesses in training dataset. These features are extracted using train_features.h5. python extract_business_features_train.py
test_business_features.csv 460 MB Format: [BusinessId, BusinessFeatures] This file contains features extracted for businesses in test dataset. These features are extracted using test_features.h5. python extract_business_features_test.py

To perform final classification:

$ cd code
$ python classify.py

About

Kaggle Competition


Languages

Language:Python 100.0%