showyourwork / showyourwork

A workflow for reproducible and open scientific articles

Home Page:https://show-your.work

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

zenodo not downloading dataset

TobiBu opened this issue · comments

Hi, I keep on getting an error with syw that one of my data files from zenodo can't be found although I think I've set up my dependency graph correctly.

from my showyourwork.yml

dependencies:
src/scipts/mdf_oxygen.py:
- src/data/2.79e12..01350_halo0_total_age_fe.dat

datasets:
10.5281/zenodo.7928529:
destination: src/data/
contents:
2.79e12..01350_halo0_total_age_fe.dat

the exact error I get is:

User authentication for 10.5281/zenodo.7224272 is valid.
Generating figure output: src/tex/figures/2.79e12_mdf_oxygen_gas.pdf...
Traceback (most recent call last):
File "src/scripts/mdf_oxygen.py", line 56, in
data_main3 = pickle.load(open( paths.data / '2.79e12..01350_halo0_total_age_fe.dat','rb'))
FileNotFoundError: [Errno 2] No such file or directory: 'src/data/2.79e12..01350_halo0_total_age_fe.dat'

any ideas what goes wrong?

I don't know off the top of my head, but how about if you try this syntax instead:

datasets:
  10.5281/zenodo.7928529:
    contents:
      2.79e12..01350_halo0_total_age_fe.dat: src/data/2.79e12..01350_halo0_total_age_fe.dat

just to make it more explicit!

that was the syntax I had before.
It's really weird. nothing seems to work for me...

that was the syntax I had before.

And it doesn't work? Can you post (or point to) the full logs and I can take a look?

in the verbose mode I found this:

User authentication for 10.5281/zenodo.7928529 is valid.
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads


syw__compile 1 1 1
syw__dag 1 1 1
syw__fig10 1 1 1
syw__main 1 1 1
total 4 1 1

Select jobs to execute...
Failed to solve scheduling problem with ILP solver. Falling back to greedy solver. Run Snakemake with --verbose to see the full solver output for debugging the problem.

Might that be the cause?

Hmmm... I haven't seen that one before! Is the project on GitHub somewhere where I could take a look, or could you email me a snapshot of it to see if I can figure out the issue.

or you might look at this repository

I literally played around the entire morning trying to figure out what happens here...

found this one

try to upgrade snakemake now

SYW pins the version of snakemake:

"snakemake==7.15.2",

and it doesn't work with the most recent versions, so YMMV with that. I'll take a look to see if I can track down the issue.

ok. thanks for the hint.
Let's see if we can figure the out.

oh, man, thanks!
what a stupid thing. I kept rechecking if the filename had a typo or if the file on zenodo was somewhat corrupted but never checked the path...

haha yeah! It took me longer than I care to admit to find it too 🤣

I'm going to close for now since I think it would be somewhat subtle to provide a useful warning or error message for this.

@dfm FWIW this is the same user error that this issue is about. Would it be difficult/undesirable for SYW to check that the scripts you've listed dependencies for exist? If it finds dependencies listed for non-existent scripts, it could warn you about that, which I think would help debug problems like this one. But if that's not something that's wanted, maybe the other issue should be closed too?

Thanks @jfcrenshaw! Yes I think something like that could be useful, but there might be an edge case that we'd hit if the file listed in dependencies is generated by a custom snakemake rule, but I think that's a pretty small use case. If you have a chance to take a look at this I'd love to chat over a PR. I'm not at my computer now, but I expect the check would go in config.py....

Oh, I think that would be a super useful feature.
Let me know if I can be of any help implementing it.