- Setup a virtual env (optional)
- Install packages:
$ pip install -r requirements.txt
- Add your credentials to the config. You can add multiple credential pairs to the
SPOTIFY_CREDENTIALS
list... if so inclined.
# /config.py
SPOTIFY_CLIENT_ID="YOUR_ID_HERE"
SPOTIFY_CLIENT_SECRET="YOUR_SECRET_HERE"
- Run the script
$ python main.py
From what I found there were three endpoints endpoints that could be used to find new artists for which you didnt already have the id.
-
the search endpoint (
/search
) max artists per request:50
-
the related artists endpoint (
/artists/{id}/related
) max artists per request:20
-
the recommendations endpoint (
/recommendations
) max artists per request:100
(but endpoint returns tracks)
The related artists endpoint is best for systematic exploration of spotify artists, as you can easily conduct a graph search of spotify artists. However, this endpoint returns the fewest artists of the three, and in practice the number of unseen artists returned by this endpoint quickly declines if it is seeded with just a single artist. This is most likely because if two artists are both related to a third artist, there is a high likelihood that those two artists are themselves also related.
The search endpoint returns more artists per request, but is not conducive for systematic exploration of artists, as it's not trivial to come up with good queries that will return unseen artists at a good rate.
Similarly, the recommendations endpoint retrieves new artists in a more random manner. But its api, accepting lists of seed artists and genres, allows for a simpler enumeration/exploration of possible inputs, at least compared to the search endpoint that can receive any string search parameter.
So, in my attempt to retrieve all artists in the fastest way I
implemented a heuristic that makes use of all endpoints. First, I use
the search endpoint and query for artists using every letter of the
alphabet. I then use those results to spawn related and recommendation
requests. While the 'collector' starts off fast, reaching speeds of
4000-5000
artists discovered per minute. This rate, however, starts slowly
declining... I've yet to run it long enough to see its
convergence. I'd assume this is due to the related endpoint causing
too many of the same artists to be returned after a while, and its
possible that the recommendation requests reinforce this. They may,
however, allow for more variance since the genres and other seed
artist ids are varied amongst these requests.
heuristic improvements...