showyourwork / showyourwork

A workflow for reproducible and open scientific articles

Home Page:https://show-your.work

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

change to Zenodo.org website now breaks file downloads with a redirect page

mkenworthy opened this issue · comments

Zenodo changed how files can be downloaded - all my cached files through showyourwork build now download an HTML page with this code:

<!doctype html>
<html lang=en>
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to the target URL: <a href="/records/8344755/files/light_curve_410e0d3c-687e-40a3-b7cb-af0057695e0b.csv">/records/8344755/files/light_curve_410e0d3c-687e-40a3-b7cb-af0057695e0b.csv</a>. If not, click the link.

Removing the file and rerunning showyourwork build results in the same error. The files are there and accessible through the web interface.

Is this for draft files or are they in a published record?

There are two different ways that syw constructs the URL:

url = entry["links"]["download"]

and

url = entry["links"]["self"]

So you'll need to isolate which one is the problem.

The Zenodo caching features are extremely brittle and hard to test so we'll see what we can do! I don't use them so there's only so much advice I can give.

It's a published and public Zenodo deposit: if you clone and build:

https://github.com/mkenworthy/HWObows.git

...it errors out because of the HTML redirect download.

So that means it's this line that I linked above:

url = entry["links"]["self"]

if anyone wants to fix it!

I think I fixed it - change record to records in

f"https://{deposit.url}/record/{deposit.deposit_id}/files/{remote_file}",

....and now the dowlnoading from Zenodo works.

@dfm actually, even after the fix in #407 I'm still having issues, now when doing showyourwork cache freeze or publish. I think maybe the Zenodo API has changed also in this regard so that maybe this is not the right URL to post a record f"https://{self.url}/api/deposit/depositions/{version_id}/actions/publish" ? but maybe there's something wrong with my setup. any ideas?

The URL changed in #407 is only ever used for Zenodo DOIs included via the datasets configuration. It has nothing to do with cached datasets! For cached datasets, you should look at the lines I linked above.

I'm going to revert #407 unless @mkenworthy is very confident that it actually fixed a Zenodo issue for dataset records.

I should also note that it looks like Zenodo was (and still is) having some major stability issues related to their recent upgrade, so I think that these issues are probably related to that. I get different HTML response errors now if I try to reproduce. I'm not sure there's too much we can do in the short term to handle this.

@dfm okay, although #407 for sure seemed to fix the issue with downloading existing zenodo records for me. Before that I wasn’t even able to compile a SYW paper locally.

Now the only issue I have is if I want to create/update the cache, but maybe that’s due to transient instability as you said (I get a 400 error).

Zenodo says their REST API changed here https://help.zenodo.org/docs/about/whats-new/ which is what I thought may be the cause. But is that unrelated?

although #407 for sure seemed to fix the issue with downloading existing zenodo records for me

Where "existing zenodo records" refers to DOIs added in the datasets config? I'm confused if it affects your cached data products, but there could be something I'm missing. I just want to be clear about exactly which kind of Zenodo record you're talking about here.

Zenodo says their REST API changed here https://help.zenodo.org/docs/about/whats-new/ which is what I thought may be the cause. But is that unrelated?

That does seem like it's probably related, but it's impossible to test right at the moment AFAICT because of instability. I definitely don't have the time to invest in chasing that down, but I'm happy to leave this issue open to track future work on this.

Just to be clear here though, there are 2 separate issues here:

  1. The record vs. records in download.py
  2. The URLs and REST API changes in zenodo.py

Maybe #407 fixes issue 1, but I can't confirm that. Issue 2 is definitely still open.

Yes, exactly. Sorry if I wasn’t clear: #407 fixed the first issue; I’m still having problems with the second one.

(Or putatively the second one, I can’t tell if it’s the REST API or just that zenodo is being unstable)

Oh and also record should definitely be replaced by records in zenodo.py. I did that in my local copy and can push a fix

I ran into the same issue; after installing the latest github version of SYW my article compiles locally again. Thanks a lot for the fix! What is the simplest way to upgrade the remote so that my article builds again there?

@matiscke Glad this was helpful! There might be a different way to do this, but I just edited the build.yml file to tell it to install SYW from Git

        with:
          showyourwork-spec: git+https://github.com/showyourwork/showyourwork.git

here's an example from a paper of mine

Many thanks @matiscke for showing how to do this, I couldn't work out how this worked!
I've tried adding this to one of my repos, but the build stops with this error:

`Command /marginicon must always come before the figure label.`

My local repo builds cleanly with no error, and trying CI=true showyourwork build also works without a problem. Do you see this with your repos?

@mkenworthy this issue maybe related to your problem.

It is related to how they validate the order of elements in a figure in

r"Command \marginicon must always come before the figure label."

A simple solution is not to use \ac in the figure caption.

Ah, yes, that's the same error. The trouble is, I don't use \ac in my figures. This seems to be a more general error in the preprocess.py code....

I seem to recall seeing that error and fixing it by switching the placement of \label within the float environment, perhaps try that?

Tried it with putting the \label as the last command in the figure, similar to your paper.... it still causes the same error.

I can submit a pull request later on to try to fix it. Probably give a very large number to the marginicon_idx when marginicon is not used will fix the problem.

@matiscke Glad this was helpful! There might be a different way to do this, but I just edited the build.yml file to tell it to install SYW from Git

        with:
          showyourwork-spec: git+https://github.com/showyourwork/showyourwork.git

here's an example from a paper of mine

Excellent, this works. Thank you @maxisi !

@mkenworthy I had no such issues (potentially due to @thomasckng's bugfix). Do you still get the error message?

Yes, that did the trick, I can now build - thank you @maxisi!

@mkenworthy can this issue be closed now?

yes, this solved the issue.