datalad / datalad

Keep code, data, containers under control with git and git-annex

Home Page:http://datalad.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`str(GitTransportRI)` broken, and with it `_get_flexible_source_candidates()`

mih opened this issue · comments

This has been reported in the office hour. There is a super subdataset configuration, where the superdataset was cloned from a datalad-annex:: URL. Worked fine.

Now getting a subdataset fails, because a generated candidate URL is exactly the same as the superdataset remote URL.

Here is where it happens:

> /home/mih/env/datalad-dev/lib/python3.11/site-packages/datalad/distribution/utils.py(74)_get_flexible_source_candidates()
-> src = str(ri)
(Pdb) p ri
GitTransportRI(RI='file:///tmp/julia/demo_micro_datalad/newstore/QC/B31_4318-datalad?type=external&externaltype=uncurl&encryption=none&url={noquery}/{{annex_key}}', path='inputs/se4318', transport='datalad-annex')
(Pdb) p str(ri)
'datalad-annex::file:///tmp/julia/demo_micro_datalad/newstore/QC/B31_4318-datalad?type=external&externaltype=uncurl&encryption=none&url={noquery}/{{annex_key}}'

str(ri) simply ignores the fact that there is a path='inputs/se4318'.

Where other code layers should catch the resulting fall-out, generating such a candidate URL makes no sense to begin with.

I believe what should have been generated is

datalad-annex::file:///tmp/julia/demo_micro_datalad/newstore/QC/B31_4318-datalad/inputs/se4318?type=external&externaltype=uncurl&encryption=none&url={noquery}/{{annex_key}}