mdeff / fma

FMA: A Dataset For Music Analysis

Home Page:https://arxiv.org/abs/1612.01840

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Errors in the FMA_large.zip and FMA_full.zip

nicolaus625 opened this issue · comments

There are some errors in the FMA_large.zip and FMA_full.zip.
I used multiple download approaches (wget and curl) from multiple links (zstd files are unavailable, the one on github repo and the kaggle version mentioned on other issues), and used multiple approaches for decompression (unzip, 7zip, tar, bzip etc.) on pmultiple linux machines. And there are many files are distorted in all the cases, such as:

/fma_large/000/000148.mp3
/fma_large/000/000149.mp3
/fma_large/000/000150.mp3
/fma_large/000/000151.mp3
/fma_large/000/000152.mp3
/fma_large/001/001000.mp3
/fma_large/001/001001.mp3
/fma_large/002/002076.mp3
/fma_large/002/002077.mp3
/fma_large/002/002078.mp3
/fma_large/002/002079.mp3
/fma_large/002/002080.mp3
/fma_large/002/002081.mp3
/fma_large/002/002082.mp3

I believed the kaggle version uploaded 9 months ago from github is a good demo of such noise. Could you please zip the fma_large and fma_full again and release them?

What do you mean by errors?

Have you checked the wiki page that reports known issues?

You can check the integrity of the downloaded .mp3 files with sha1sum -c checksums.

Maybe what you think to be noise is what the artist intended to produce. For example, you can listen to the original of 000148.mp3 at https://freemusicarchive.org/music/Contradiction/Contradiction and understand why it sounds like it does:

If one ever played around with a microphone in front of a set of speakers, you know it can create feedback. If you are not afraid and keep on holding the microphone in front of the speakers, you know you can sing through it, or scream or shout.

Note also that this track's genre is "Experimental → Avant-Garde". You might want to filter by genre (or other tags) to exclude tracks.

Have you checked the wiki page that reports known issues?

Yes. I believe most of the audio I found are not belongs to the error audio are not mentioned in the wiki page.

You can check the integrity of the downloaded .mp3 files with sha1sum -c checksums.

Already done. It shows OK to me.

Note also that this track's genre is "Experimental → Avant-Garde". You might want to filter by genre (or other tags) to exclude tracks.

That make sense. How do you know the recording belongs to "Avant-Garde"? There is no much information in the tracks.csv

Best regards

That make sense. How do you know the recording belongs to "Avant-Garde"? There is no much information in the tracks.csv

Great! Genre information is found in tracks.csv and genres.csv. Please checkout the usage.ipynb notebook.

THank you. Another concern is some audio like /fma_large/001/001000.mp3
https://freemusicarchive.org/music/Kevin_Shields/The_Death_of_Patience/ contain too much noise. It seems that the audio from 13 second on the website is broken. Is this also a genre issues?

Well, the genre of these tracks is literally "Noise".

lol, thank you very much.