MrLaki5 / Music-scraper-clusterizer

Clusterization of music data with built in web crawler and scraper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Music-scraper-clusterizer:

Description:

  • Project done for faculty course hidden knowledge.
  • Subject of project was:
    1. Web scraping of site discogs for data about albums, songs, and artists of Yugoslavia and Serbia.
    2. Analyses of scraped data with plotting and querying.
    3. Unsupervised clustering of scraped data.

How to run:

  • Install python 3.7
  • Install docker
  • Install needed modules with: pip install -r requirements.txt
  • Run database in docker: docker-compose -f docker-compose.yml up -d
  • Change dir from root to /src
  • Run project with: python main.py

Results:

Scraping:

  • Scraping time: 72h
  • Albums scraped: 65573
  • Artists scraped: 62025
  • Songs scraped: 435107

Plotting:

  • Genres count, top 6:

  • Song count, grouped by song length:

  • Album count, grouped by decades:

  • Album count, grouped by is name written in cyrillic:

  • Album count, grouped by genres number of album:

About

Clusterization of music data with built in web crawler and scraper

License:MIT License


Languages

Language:Python 99.9%Language:Dockerfile 0.1%