mdeff / fma

FMA: A Dataset For Music Analysis

Home Page:https://arxiv.org/abs/1612.01840

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Corrupted Files?

albert239825 opened this issue · comments

Hello, I was trying to convert the small dataset to .wav using pydub and some files gave me errors trying to import. I tried them with librosa and they also failed. The files are as listed:

fma_small/099/099134.mp3
fma_small/108/108925.mp3
fma_small/133/133297.mp3

Please let me know if I did something wrong or if you are also getting the error. Thanks.

That's a known issue (#41). Those 3 files have no audio at all (due to erroneous metadata). There are 6 files with less than 30s of audio in the small subset.

Any idea how we could make the list of known issues more visible?

So sorry didn't see that. I think maybe you could put a disclaimer in the readme that certain files will cause errors when trying to load in. I'm just starting machine learning so I might not be the best person to ask about this topic. Thanks for compiling an overall amazing dataset.

Thanks for the kind words. There's a link under "History", but it might not be visible enough.

I've added a wiki page and an hopefully more visible link in the README:
20200722_182743

I've found files in the fma_large and fma_full partitions that are malformed by using the following query:

find fma/data/fma_large/ -iname *.mp3 -type f -size -4097c

Basically searched for any files which are 4kB or smaller. I deleted these files and unzipped them again from fma_large/fma_full, still seeing these as malformed. soxi is unable to read any data from them. I've manually added their song IDs to my script's errata.

fma_large/001/001486.mp3
fma_large/002/002624.mp3
fma_large/003/003284.mp3
fma_large/005/005574.mp3
fma_large/008/008669.mp3
fma_large/010/010116.mp3
fma_large/011/011583.mp3
fma_large/012/012838.mp3
fma_large/013/013529.mp3
fma_large/014/014116.mp3
fma_large/014/014180.mp3
fma_large/020/020814.mp3
fma_large/022/022554.mp3
fma_large/023/023429.mp3
fma_large/023/023430.mp3
fma_large/025/025173.mp3
fma_large/025/025174.mp3
fma_large/025/025175.mp3
fma_large/025/025176.mp3
fma_large/025/025180.mp3
fma_large/029/029345.mp3
fma_large/029/029346.mp3
fma_large/029/029352.mp3
fma_large/029/029356.mp3
fma_large/033/033411.mp3
fma_large/033/033413.mp3
fma_large/033/033414.mp3
fma_large/033/033417.mp3
fma_large/033/033418.mp3
fma_large/033/033419.mp3
fma_large/033/033425.mp3
fma_large/035/035725.mp3
fma_large/039/039363.mp3
fma_large/041/041745.mp3
fma_large/042/042986.mp3
fma_large/043/043753.mp3
fma_large/050/050594.mp3
fma_large/050/050782.mp3
fma_large/053/053668.mp3
fma_large/054/054569.mp3
fma_large/054/054582.mp3
fma_large/061/061480.mp3
fma_large/061/061822.mp3
fma_large/063/063422.mp3
fma_large/063/063997.mp3
fma_large/065/065753.mp3
fma_large/072/072656.mp3
fma_large/072/072980.mp3
fma_large/073/073510.mp3
fma_large/080/080237.mp3
fma_large/080/080391.mp3
fma_large/080/080553.mp3
fma_large/082/082699.mp3
fma_large/084/084503.mp3
fma_large/084/084504.mp3
fma_large/084/084522.mp3
fma_large/084/084524.mp3
fma_large/086/086656.mp3
fma_large/086/086659.mp3
fma_large/086/086661.mp3
fma_large/086/086664.mp3
fma_large/087/087057.mp3
fma_large/090/090244.mp3
fma_large/090/090245.mp3
fma_large/090/090247.mp3
fma_large/090/090248.mp3
fma_large/090/090250.mp3
fma_large/090/090252.mp3
fma_large/090/090253.mp3
fma_large/090/090442.mp3
fma_large/090/090445.mp3
fma_large/091/091206.mp3
fma_large/092/092479.mp3
fma_large/094/094052.mp3
fma_large/094/094234.mp3
fma_large/095/095253.mp3
fma_large/096/096203.mp3
fma_large/096/096207.mp3
fma_large/096/096210.mp3
fma_large/098/098105.mp3
fma_large/098/098558.mp3
fma_large/098/098559.mp3
fma_large/098/098560.mp3
fma_large/098/098562.mp3
fma_large/098/098571.mp3
fma_large/099/099134.mp3
fma_large/101/101265.mp3
fma_large/101/101272.mp3
fma_large/101/101275.mp3
fma_large/102/102241.mp3
fma_large/102/102243.mp3
fma_large/102/102247.mp3
fma_large/102/102249.mp3
fma_large/102/102289.mp3
fma_large/105/105247.mp3
fma_large/106/106409.mp3
fma_large/106/106412.mp3
fma_large/106/106415.mp3
fma_large/106/106628.mp3
fma_large/108/108920.mp3
fma_large/108/108925.mp3
fma_large/109/109266.mp3
fma_large/110/110236.mp3
fma_large/115/115610.mp3
fma_large/117/117441.mp3
fma_large/126/126981.mp3
fma_large/127/127336.mp3
fma_large/127/127928.mp3
fma_large/129/129207.mp3
fma_large/129/129800.mp3
fma_large/130/130328.mp3
fma_large/130/130748.mp3
fma_large/130/130751.mp3
fma_large/131/131545.mp3
fma_large/133/133297.mp3
fma_large/133/133641.mp3
fma_large/133/133647.mp3
fma_large/134/134887.mp3
fma_large/140/140449.mp3
fma_large/140/140450.mp3
fma_large/140/140451.mp3
fma_large/140/140452.mp3
fma_large/140/140453.mp3
fma_large/140/140454.mp3
fma_large/140/140455.mp3
fma_large/140/140456.mp3
fma_large/140/140457.mp3
fma_large/140/140458.mp3
fma_large/140/140459.mp3
fma_large/140/140460.mp3
fma_large/140/140461.mp3
fma_large/140/140462.mp3
fma_large/140/140463.mp3
fma_large/140/140464.mp3
fma_large/140/140465.mp3
fma_large/140/140466.mp3
fma_large/140/140467.mp3
fma_large/140/140468.mp3
fma_large/140/140469.mp3
fma_large/140/140470.mp3
fma_large/140/140471.mp3
fma_large/140/140472.mp3
fma_large/142/142614.mp3
fma_large/143/143992.mp3
fma_large/144/144518.mp3
fma_large/144/144619.mp3
fma_large/145/145056.mp3
fma_large/146/146056.mp3
fma_large/147/147419.mp3
fma_large/147/147424.mp3
fma_large/148/148786.mp3
fma_large/148/148787.mp3
fma_large/148/148788.mp3
fma_large/148/148789.mp3
fma_large/148/148790.mp3
fma_large/148/148791.mp3
fma_large/148/148792.mp3
fma_large/148/148793.mp3
fma_large/148/148794.mp3
fma_large/148/148795.mp3
fma_large/151/151920.mp3
fma_large/155/155051.mp3
fma_full/080/080237.mp3
fma_full/145/145056.mp3
fma_full/015/015608.mp3
commented

I'm noticing that several files seem to cause segfaults in AudioLoader. They come up as warnings on single frames (I think)—[ WARNING ] AudioLoader: invalid frame, skipping it: Invalid data found when processing input—but will eventually crash.

Is there a way around this?