Only 7989 track data in tracks.csv for subset small?

Question

Only 7989 track data in tracks.csv for subset small?

SylCard opened this issue 7 years ago · comments

Hi, im trying to use the fma dataset for cnn training.

I'm currently attempting to retrieve metadata for the fma_small subset (the track_id and genre_top) for the 8000, however there seems to be 11 rows of missing data. Perhaps my csv file is corrupt or there is an error.

Appreciate your help!

Michaël Defferrard · Answer 1 · Thu Jan 18 2018 00:15:48 GMT+0800 (China Standard Time)

There probably is an error in the way you manipulate the data. You can check if your files are not corrupt with sha1sum -c checksums (to be executed in the directory where you decompressed fma_metadata.zip).

If you're using Python, the following code will select the 8000 tracks from the small dataset:

>>> import utils
>>> tracks = utils.load('tracks.csv')
>>> small = tracks[tracks['set', 'subset'] <= 'small']
>>> small.shape
(8000, 56)

Silverstre · Answer 2 · Wed Jan 31 2018 23:47:45 GMT+0800 (China Standard Time)

Thank you! Also when querying certain rows in the track csv for their top_genre it returns none?! Is there anyway to extract the correct genre for these tracks?

Michaël Defferrard · Answer 3 · Wed Feb 21 2018 06:10:09 GMT+0800 (China Standard Time)

Some tracks have no top_genre because they belong to multiple root genres. You'll find more details in the paper.