mdeff / fma

FMA: A Dataset For Music Analysis

Home Page:https://arxiv.org/abs/1612.01840

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

host on cloud computing provider

danmackinlay opened this issue · comments

A suggestion - I notice there are a few open issues about outdated data version, so I presume the hosting of this data is inconvenient to update. As such i might be worth hosting the data somewhere else.

according to the FAQ, Microsoft Research Open Data will host data sets up to 250gb. Amazon ad probably google offer similar schemes.

Amazon's AWS also hosts data sets and has a formal submission procedure for new data sets.

Or maybe on https://zenodo.org/ ?
It is a Swiss (CERN based) data repository for scientific data sets, it gives DOI, you can link exisiting publications to it, and it has no space limit. (By default, it is 50GB, but you can contact them by email, and they will lift the limit for the given upload.)

And Zenodo has a simple, usable API.

Thanks for the suggestions! AWS and Microsoft are potential providers. I like Zenodo, but when I contacted them in May 2017 about hosting the FMA they answered: "Unfortunately the data sizes you mentioned are above of what we can accept." Another option is torrents (#32), though I don't know how convenient that is in general, and how to ensure that there's always one peer up.

The current hosting is not inconvenient to update, but I think that we should strive to update as infrequently as possible. One problem is that published results are only comparable on the same data, so every update makes things more difficult to compare.

I've documented the known issues in the README and in meta-issue #41. Hope that helps for the time being.