Million Songs Analysis

Problem description

Given a song, the goal is to find some other similar ones.

The choice of ‘similarity’ is left to the user depending on what he is looking for (same artist, similar attributes like hotness or danceability...)

There are some other restrictions that must be taken into consideration :

A user should be able to narrow the search by giving more specific information
The responding time should be short enough to seem interactive
The data should be analyzed in a scalable way

Data set

The data set being used is called 'Million Song Dataset', it contains meta data from around one million different songs, and has a total size of 300 GB.

It is stored using Hierarchical Data Format version 5 (HDF5) which is a storage model for data with many dimensions and complex objects.

The data set does not store any actual audio but instead consists of song features and meta-data collected and analyzed by The Echo Nest. Each HDF5 file stores information about one song as seen in the picture below.

About

Code for the analysis of the large data set "Million Songs" in order to suggest similar songs

Languages

Language:Jupyter Notebook 100.0%