eth-cscs / firecrest

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

/storage/xfer-external/upload endpoint is unreliable

simonbray opened this issue · comments

I'm trying to upload files via (py)FirecREST. I have tried to use the below code, which I think is the recommended option:

upload = client.external_upload(machine, f, path)
upload.finish_upload()

But in many cases this completes either 1) without uploading the file, or 2) in the case of some larger files only partially uploading it, so that the file size on the CSCS machine is smaller than locally. In both cases no error is raised and the problem only becomes apparent when subsequent requests are made (e.g. slurm job submissions which rely on the file being available).

I am using client.simple_upload() for now, but this isn't ideal and also doesn't work for larger files.

Hm are you checking the status of the object? I put some small warning in the docs about it but I get your point that maybe it is not very clear: https://pyfirecrest.readthedocs.io/en/latest/tutorial.html#external-upload

To upload a file you would have to ask for the link in the staging area and upload the file there. Even after uploading the file there, it will take some time for the file to appear in the filesystem. You can always follow the status of the task with the status method and when the file has been successfully uploaded the status of the task will be 114.

With the external upload the file is uploaded by the user to a "staging area" and as soon as FirecREST realizes, it will transfer the file to the machine's filesystem. The finish_upload method will upload the file to the staging area but then you have to make sure that the status is 114 before starting to use the file.

Could you try that please? If the status is 114 and still you are having issue then we can have a look again to see if we have a bug in the storage microservice. I will try to make the docs more clear and maybe it makes sense to add an option in simple_upload that will block until the status is 114.

Btw because of the double transfer it makes sense to use simple_upload as much as possible. If your files are small enough it is faster and easier to upload them with the simple_upload.

Ah, I didn't read the docs closely enough. You are right, after waiting for a status of 114 the file is accessible.