Downloading dataset folder structure with python interface.
abcsds opened this issue · comments
Hi Richard,
I'm working with restricted disk space, so I'd like to download and process files one by one with the python interface. For that I need a list of all files in a dataset first (or the dataset folder structure, filenames, etc.), but I can't find a public method to do so. In the _download
sub-module I found the _iterate_filenames
, but seems to work on the already-downloaded dataset. I need to know the files I want to include in the download
function call before I call it. Maybe I'm missing something.
I can infer the folder structure to the subject level from the participants.tsv
, then iteratively download the scans.tsv, which should contain every data file. Do I have it right?
I'll be glad to implement something in a PR, but I'd like to hear what you think first.
Greetings!
Alberto
Hello, this is actually currently not supported.
In the download()
function, we do however create a list filenames
which, I believe, contains all the files in the dataset.
If you could refactor the download()
function such that you move the filenames
generation to a new, separate function, and call that one from download()
, you can then also re-use it in a new get_filenames()
(or something like that) function, which doesn't actually perform a download, but simply returns the filenames.
cc @larsoner