datalad / datalad

Keep code, data, containers under control with git and git-annex

Home Page:http://datalad.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

parallel get from datalad archive gives error

bpinsard opened this issue · comments

What is the problem?

running datalad get -J 8 . in a dataset with a single datalad archive will give the following error:

[git-annex: .git/annex/tmp/MD5E-s8351481--f35cc3878536756c8567cc9d2421f6e6.7z: removeLink: does not exist (No such file or directory)]

but it do not raises errors without the -J 8 flag.

My 2cent is that one worker tries to get the archive, while another tries to get a file from the archive thus gets the archive too, causing conflict.

While getting files from a single archive is clearly not parallelizable, getting files with glob from multiple archives in parallel could make sense.

What steps will reproduce the problem?

create a dataset, add an archive, add archive content, push archive to a remote, drop all, try parallel get.

DataLad information

datalad 0.19.5

Additional context

No response

Have you had any success using DataLad before?

indeed ;D