mdeff / fma

FMA: A Dataset For Music Analysis

Home Page:https://arxiv.org/abs/1612.01840

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Only 7989 track data in tracks.csv for subset small?

SylCard opened this issue · comments

Hi, im trying to use the fma dataset for cnn training.

I'm currently attempting to retrieve metadata for the fma_small subset (the track_id and genre_top) for the 8000, however there seems to be 11 rows of missing data. Perhaps my csv file is corrupt or there is an error.

Appreciate your help!

There probably is an error in the way you manipulate the data. You can check if your files are not corrupt with sha1sum -c checksums (to be executed in the directory where you decompressed fma_metadata.zip).

If you're using Python, the following code will select the 8000 tracks from the small dataset:

>>> import utils
>>> tracks = utils.load('tracks.csv')
>>> small = tracks[tracks['set', 'subset'] <= 'small']
>>> small.shape
(8000, 56)

Thank you! Also when querying certain rows in the track csv for their top_genre it returns none?! Is there anyway to extract the correct genre for these tracks?

Some tracks have no top_genre because they belong to multiple root genres. You'll find more details in the paper.