datalad / datalad

Keep code, data, containers under control with git and git-annex

Home Page:http://datalad.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add ability to limit get (and thus install) --recursive installation of subdatasets

yarikoptic opened this issue · comments

I think this case came up a number of times. I do not think we arrived at any facility to address those use-cases. ATM we only have -R|--recursion-limit to limit via hardcoded recursion limit integer. Here I would like to collect use cases which could drive us a solution to tackle

  • "YODA'ed datasets from above": it is typical to recommend installing some version from the flattened level above, e.g. smth like datalad create -d . derivatives/deriv1 && cd derivatives/deriv1 && datalad clone -d . ../../rawdata sourcedata && ... && cd - && datalad save -m "Finalized deriv1 derivative" -d . derivatives/deriv1 && datalad uninstall derivatives/deriv1/sourcedata. So that later on someone could install the hierarchy of the dataset but without installing those sourcedata/ installations.
    • note that URL in .gitmodules for sourcedata/ might be replaced with a public URL, not just local one.
    • good examples:
    • here might be a rule "do not install subdataset with UUID of a dataset or immediate subdataset somewhere in superdatasets"
  • YODA'ed datasets from somewhere else: example is https://github.com/OpenNeuroDerivatives/OpenNeuroDerivatives which has superdataset which doesn't include in top superdataset https://github.com/ReproNim/containers/ and https://github.com/poldracklab/tacc-openneuro/ but those are included in every subdataset. Currently could be "addressed" via
    • -R 1 but might prevent not installing some "ad-hoc" subdataset
    • expressing in the terms of use-case right before by including those subdatasets in top level superdataset (might be the best way to go)
  • Avoid private - install until hitting a Private (on github) repo. Use case:
    • dandi/dandisets-healthstatus#73
      may be could be re-expressed via
    • adding some labeling within .gitmodules records, and then explicitly allowing to say to not install submodules with specific label/tag.

WDYT @datalad/developers and @datalad/contributors -- did you have related use-cases?