adrz / movie-posters-convnet

Unsupervised clustering of movie posters with features extracted from Convolutional Neural Network

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Build Status codecov

Demo

Overview

Unsupervised clustering of movie posters with features extracted from Convolutional Neural Network. Visualization using flask as a backend and d3js for the frontend.

This project is divided into 3 main scripts:

  • get_posters.py
    • retrieve the posters from impawards.com.
    • create a thumbnail for each posters for the visualization.
  • get_features_from_cnn.py
    • extract the last convolution layer of a pre-trained ConvNet (VGG-16 or ResNet50)
  • get_data_visu.py
    • dimension reduction for data-visualization with umap.
    • compute the cosine similarity and extract the 6 ``closest'' images for each posters.

To get parameters descriptions:

  • python src/get_XXX.py --help

Requirements

OS

  • Linux/Unix/OSX (requirement for wget)
  • Python 3.3+
  • ImageMagick
  • Postgresql

Packages Python

  • BeautifulSoup 4.4
  • Tensorflow
  • Keras
  • Pandas
  • requests
  • sklearn
  • numpy
  • PIL
  • flask

Warnings

The extraction of the features from ConvNet is long if you do not owned a GPU. The computation of the similarity between each posters required O(n^2) in memory which required around 32Go of RAM.

Installation

Clone the depot:

$ git clone https://github.com/adrz/movie-posters-convnet.git
$ cd movie-posters-convnet/
$ virtualenv -p python3 env
$ source env/bin/activate
$ pip install -r requirements-gpu.txt

Create postgresql database (supposed you already install postgresql):

$ psql -U postgres -c "createuser movieposters;"
$ psql -U postgres -c "createdb movieposters;"
$ psql -U postgres -c "alter user movieposters with encrypted password 'yourpassword';"
$ psql -U postgres -c "grant all privileges on database movieposters to movieposters ;"

Usage

Computation

After cloning you can just launch the bash script that will:

  • download posters from 1920 to 2016
  • compute features
  • compute the datavisualization features
$ python src/get_posters.py -c config/development.conf
$ python src/get_get_features_from_cnn.py -c config/development.conf
$ python src/get_data_visu.py -c config/development.conf

Then grab a coffee...

Visualization

$ source env/bin/activate
$ configapi=./config/development.conf
$ python app.py

Then launch index.html into your favorite browser:

$ chromium 127.0.0.1:5000/index.html

or

$ chromium 127.0.0.1:5000/index_complete.html

Results

Cherry-piking from the top-200 closest couple of posters (relative to cosine distance):


































License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

About

Unsupervised clustering of movie posters with features extracted from Convolutional Neural Network


Languages

Language:Python 47.2%Language:CSS 23.1%Language:HTML 16.0%Language:JavaScript 9.7%Language:Shell 2.5%Language:Dockerfile 1.6%