showyourwork / showyourwork

A workflow for reproducible and open scientific articles

Home Page:https://show-your.work

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Diagnosing why a cache is not being pulled from Zenodo after publishing

mkenworthy opened this issue · comments

I've used the Zenodo sandbox caching for a paper we've now published at https://github.com/mkenworthy/HWObows/

I've frozen and published the cache as an public Zenodo deposit at https://zenodo.org/record/8139609

Building the github repo, however, shows that although the Zenodo cache is valid, showyourwork cannot download the cached file and it reruns the scripts instead. Looking at the log files doesn't give any clue as to why showyourwork can see the Zenodo deposit but cannot download the cached file.

Setting up the workflow...
Testing if user is authenticated for 10.5281/zenodo.8144596...
User authentication for 10.5281/zenodo.8144596 is valid.
Testing if user is authenticated for 10.5281/zenodo.8139609...
User authentication for 10.5281/zenodo.8139609 is valid.

and then:

Attempting to access Zenodo deposit with DOI 10.5281/zenodo.8139609...
Failed to access Zenodo deposit with DOI 10.5281/zenodo.8139609.
Attempting to access Zenodo record with DOI 10.5281/zenodo.8139609...
Searching Zenodo Sandbox cache...
File not found on remote cache. See logs for details.
'NoneType' object has no attribute 'download_file'
Running rule from scratch...

Any suggestions as to what's going on?

I'm currently on leave so I don't have a huge capacity to look into this, but I made it as far as reproducing the issue 😀

Pinging @katiebreivik in case she has ideas!

Sorry it took me so long to get to taking a look at this! I was moving NYC --> PGH and am playing catchup. I just cloned/built with showyourwork and found no problems with downloading the data from Zenodo. Perhaps this was another intermittent Zenodo issue?

Apologies -- spoke to soon since the other dataset access went smoothly. I can verify that I do run into a Zenodo access problem:

Running user rule create_eccentric_orbits_data...
Searching remote file cache: src/data/eccentric-orbits.npz...
File not found on remote cache. See logs for details.
Running rule from scratch...

In the logs I find slightly more info:

Attempting to access Zenodo deposit with DOI 10.5281/zenodo.8139609...
Failed to access Zenodo deposit with DOI 10.5281/zenodo.8139609.
The server could not verify that you are authorized to access the URL requested. You either supplied the wrong credentials (e.g. a bad password), or your browser doesn't understand how to supply the credentials required.

After a bit of digging it looks like syw is naming the datafile produced by create_eccentric_orbits_data incorrectly. The name of the datafile in the syw-created Zenodo deposit should be eccentric-orbits.npz but the datafile is being named by the rule rather than the output. Will keep digging to see if I can find the part of syw that would do this.

hi - is there anything I can do to help resolve this?

Hi @mkenworthy -- sorry for being slow here! I got pulled away from debugging to handle an institution move and start on course prep.

I will try to get back to this sometime this week. I was working through a minimum working example that performed similar caching to your project, but so far had only been working with Sandbox and wasn't running into issues. This makes me worry that it is a Zenodo (rather than Sandbox) issue, but I'm not a Zenodo expert by any means.

Many thanks - bear in mind that it too worked with the Sandbox with no problems for me, it was when I pushed it to Zenodo that it broke. I looked at the code and it seems the same subroutine is being called for both sandbox and Zenodo, but I don't know enough coding to put in a breakpoint and see what's causing it to ask for the wrong name as you pointed out earlier in the thread.