Our Project

This is a project that has been implemented as part of the Coding da Vinci Hakathon and uses this data set from the Augustinermuseum. Our project can be split into three use cases:

Style Transfer
Similarity
Exploration

Getting Started

Architecture

Our Project consists of python python backend (styletransfer, similarity, videos, utils), and javascript react bootstrap frontend (sda_mathcer). Connection is performed through flask in python and ajax in javascript.

Prerequisites

Install the environment with conda
Install python flask server on the remote server with GPU
Start app.py on the remote server
Start the tunnel from the server to your local device with 127.0.0.1:5000
Install react and npm on the local device
Install all dependencies in sdm_matcher using npm
Start the localhost

Style Transfer

Description

This is a modification of Neural Neighbor Style Transfer Some modifications are done to perform FP16 iterations on GPU. This modification reduces execution time by factor 2 and the quality remains almost the same.

Use

The interface execute_one proposes the method execute_one(...) for processing of one image with image style. The content and style images are awaited under /styletransfer/NeuralNeighborStyleTransfer/inputs

Similarity

We adapted the unsupervised learning method to cluster portraits. The trained-well clustering model was stored in a pickle file. And these portraits were clustered into eight groups and stored in a pickle file. Also, when a user uploads a new picture by GUI, we exploit the similarity and make a reliable prediction that portraits in which cluster has the highest similarity with this new picture.

The details are as follows:

Extract the features of each portrait.
Apply the k-means algorithm to process the feature matrix and form a robust clustering model.
Use this model and predict function to decide a new picture belongs to which cluster.
Return all the portraits in this cluster.

How to use it?

Run the requirements.txt and import all the necessary packages.
Run the cluster function to train a model.
Run the prediction function to get the most similar portraits.

Exploration

We created videos in .mp4 form with lip-sync, where the portraits introduce themselves. Files that are relevant for this use case including requirements.txt can be found in /videos.

To realise this we had to go through the following steps:

Extract the needed information from JSON file provided in the data set for every portrait. (Extraction code can be found in text_to_speech.py)
Convert the resulting text to audio. For Female voice in German we used the gtts library here. For male voice in german we used the ibm-watson library here and the available voice models can be found here.
Convert the .jpg to .mp4 videos. A face has to be available in every frame or else the model (see next step) will through an error.
Feed the Wav2Lip Model each audio and video and the output will be a video in an mp4 format with lip-sync.

All video results can be found here.

zekssmy / lookatme