Download your Last.fm history through lastfm-to-csv and place it in the /data
folder.
Create a Musicbrainz API account and application, then create a file named KEYS.py
file with the contents:
###############################################################################
# KEYS.py
###############################################################################
(MB_NM, MB_V, MB_URL) = (<MUSICBRAINZ_APP_NAME>, "0.1", <USER_HOMEPAGE>)
(MB_USR, MB_PSW) = (<USERNAME>, <PASSWORD>)
GEO_USR = 'test'
To help cleanup the original last.fm dataset, these scripts also require the creation of a BANS.py
file with contents similar to the following:
###############################################################################
# BANS.py
###############################################################################
# The following set of names will be excluded from parsing and analysis
# (this happens before musicbrainz data)
BAN = set([
'Nature Publishing Group', 'Douglas Adams',
"The Skeptics' Guide to the Universe", 'chipdelmal'
])
# The following dictionary will replace the artists name on the sets with the
# name defined by the entry key
SWP_PRE = {
'Courteeners': {'The Courteeners', 'Courteeners'},
'Belle & Sebastian': {'Belle & Sebastian', 'Belle Sebastian'},
'The Smashing Pumpkins': {'Smashing Pumpkins'},
}
These pipelines are easiest to use by following the folder structure:
./data
- Place the original CSV containing Last.fm's data here
./cache
- Networks and artists counts will be exported here
./img
- Images and plots will be exported here
To clean the original dataset from Last.fm, download artists' information from Musicbrainz, and amend the artists in Last.fm data with Musicbrainz tags; run:
./GenerateDatasets.sh 'chipmaligno' '.'
Which launches the following scripts in order:
python Clean_Lastfm.py 'chipmaligno' './data'
python Download_Musicbrainz.py 'chipmaligno' './data'
python Fix_Dataframe.py 'chipmaligno' './data'
Generating the transitions matrices can be done with the following command:
./GenerateMatrices.sh 'chipmaligno' '.' 100 5
Which launches the following script ranging from 1 to 5 in the window size:
python Compute_Transitions.py 'chipmaligno' './data' './cache' 100 1
...
python Compute_Transitions.py 'chipmaligno' './data' './cache' 100 5
Clean_Lastfm.py
: Takes the raw last.fm data from the csv, removes duplicates, removes banned artists, and converts into a timezoneDownload_Musicbrainz.py
: Downloads artists' data from musibrainz into a CSV file.Fix_Dataframe.py
: Filters a dataframe between dates, checks for inflated counts, and amends artists names to match musicbrainz tags.Compute_Transitions.py
: Calculates transitions matrix (both in frequency and probability) for top N artists through a weighted window. Please have a look at my blogpost.Plot_Chord.py
: Exports a chord diagram with a given number of artists and a window range (as exported byCompute_Transitions.py
).Markov_Playlist.py
: Generates a playlist purely based on a Markov random walker process based on the artists probability matrix and song frequencies.
auxiliary.py
: Non-specific functions needed by routines.musicbrainz.py
: Functions needed to scrape artists' data.network.py
: Functions needed to do network and transitions matrices transformations.
CONSTANTS.py
: General constants common for all the scripts.BANS.py
: Dictionaries and sets that contain bans and fixes for artists' names.KEYS.py
: Musicbrainz app keys and login information.