Downloading dataset folder structure with python interface.

Question

Downloading dataset folder structure with python interface.

abcsds opened this issue a year ago · comments

Hi Richard,

I'm working with restricted disk space, so I'd like to download and process files one by one with the python interface. For that I need a list of all files in a dataset first (or the dataset folder structure, filenames, etc.), but I can't find a public method to do so. In the _download sub-module I found the _iterate_filenames, but seems to work on the already-downloaded dataset. I need to know the files I want to include in the download function call before I call it. Maybe I'm missing something.
I can infer the folder structure to the subject level from the participants.tsv, then iteratively download the scans.tsv, which should contain every data file. Do I have it right?

I'll be glad to implement something in a PR, but I'd like to hear what you think first.

Greetings!
Alberto

Richard Höchenberger · Answer 1 · Tue Dec 12 2023 18:14:33 GMT+0800 (China Standard Time)

Hello, this is actually currently not supported.

In the download() function, we do however create a list filenames which, I believe, contains all the files in the dataset.

If you could refactor the download() function such that you move the filenames generation to a new, separate function, and call that one from download(), you can then also re-use it in a new get_filenames() (or something like that) function, which doesn't actually perform a download, but simply returns the filenames.

cc @larsoner