difficulty converting metadata file yaml -> tex

Question

difficulty converting metadata file yaml -> tex

mcbaneg opened this issue 5 years ago · comments

Setup: Windows 10 (64-bit), MikTeX, Python 2.7 with ruamel.yaml installed, cygwin

I modified the metadata.yaml file.
I first tried 'make' as recommended, but I have Python 2.7 installed and the command failed at python3.

I then tried running it directly (within Windows PowerShell), and got

PS C:\virial\rescience-c\template-master> python ./yaml-to-latex.py -i metadata.yaml -o metadata.tex
Traceback (most recent call last):
File "./yaml-to-latex.py", line 71, in
from article import Article
File "C:\virial\rescience-c\template-master\article.py", line 4, in
import yaml
ImportError: No module named yaml

Evidently the module name in ruamel.yaml (which seems to be the most up-to-date yaml kit for python) is different from that expected. I don't see documentation on which module (or which python version) authors are expected to be using. Suggestions?

Thanks,
G.

Konrad Hinsen · Answer 1 · Mon Nov 18 2019 15:23:22 GMT+0800 (China Standard Time)

The YAML package that you need is PyYAML. I have never heard about ruamel.yaml before. It seems to be a fork of PyYAML, but I have no idea how compatible it is. I don't know either if either YAML package supports Python 2.7 in a sufficiently recent version.

We should have clearer instructions for the template. I added them in other ReScience repositories, but not so far on this one. Unfortunately reproducibility is becoming a problem even for article submissions!

Nicolas P. Rougier · Answer 2 · Mon Nov 18 2019 19:52:26 GMT+0800 (China Standard Time)

That's a clear but unfortunate illustration of the reproducibility problem. I think I designed the yaml-to-latex.py script with Python 3 and never tested it with Python 2. Even though the PyYAML package is quite standard, it might be good to add a pointer in the template repository.

By the way, if your entry is for the Ten Years Reproducibility Challenge, we'll soon give some proposals for author on what to write in the article. We're a bit late on that.

George McBane · Answer 3 · Mon Nov 18 2019 22:51:43 GMT+0800 (China Standard Time)

* By the way, if your entry is for the Ten Years Reproducibility Challenge, we'll soon give some proposals for author on what to write in the article. We're a bit late on that. It is for that challenge. I have a first draft of my paper already, if you’d like to see what somebody produced with just the hints that are already out there. Problems in my draft might help you figure out what to put in the guidelines ☺ My difficulty (other than the simple mechanics of submission) is that I’m not convinced the draft is interesting enough to be worth submitting. It does demonstrate that straightforward use of well-established tools (Fortran, ACM-TOMS routines, BLAS) is likely to produce long-lived programs if you just hang on to the source code and sample inputs.

…

-G. George C. McBane (mcbaneg@gvsu.edu<mailto:mcbaneg@gvsu.edu>) Professor of Chemistry Assistant Dean for Research, Facilities, and Analytics College of Liberal Arts and Sciences Grand Valley State University B-4-229 Mackinac Hall, (616) 331-2506 Chemistry site http://www.gvsu.edu/chem/ CLAS site http://www.gvsu.edu/clas/ individual site http://faculty.gvsu.edu/mcbaneg/ From: Nicolas P. Rougier <notifications@github.com> Sent: Monday, November 18, 2019 6:52 AM To: ReScience/template <template@noreply.github.com> Cc: George McBane <mcbaneg@gvsu.edu>; Author <author@noreply.github.com> Subject: Re: [ReScience/template] difficulty converting metadata file yaml -> tex (#6) That's a clear but unfortunate illustration of the reproducibility problem. I think I designed the yaml-to-latex.py script with Python 3 and never tested it with Python 2. Even though the PyYAML package is quite standard, it might be good to add a pointer in the template repository. By the way, if your entry is for the Ten Years Reproducibility Challenge, we'll soon give some proposals for author on what to write in the article. We're a bit late on that. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#6?email_source=notifications&email_token=ANZTG5VKUFVGRGXSL6OKDMLQUJ6XXA5CNFSM4JONZHN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEKGAII#issuecomment-554983457>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ANZTG5WTU7ZTGFFCR7KVZSTQUJ6XXANCNFSM4JONZHNQ>.

Konrad Hinsen · Answer 4 · Mon Nov 18 2019 23:19:49 GMT+0800 (China Standard Time)

@mcbaneg Please do submit your work! Your conclusion is just the kind of outcome we'd like to see from the challenge: what has worked in the past to maintain code in working state, and what hasn't. If people submit only the difficult-to-reproduce cases, we might erroneously conclude that nothing has ever worked!

Nicolas P. Rougier · Answer 5 · Tue Nov 19 2019 02:11:40 GMT+0800 (China Standard Time)

@mcbaneg As @khinsen suggest, you can submit and your submission will serve as a testbed for others. Among the things that might be interesting is:

How did you conserve the sources
Did you take care of registering RNG seed (if you use it)
Did you save command line options (if you need some options)
Did you need to adapt your sources ?
Did you need to adapt your libraries ?
What guided your choice of fortran among other languages at that time
etc.

Konrad Hinsen · Answer 6 · Tue Nov 19 2019 15:16:33 GMT+0800 (China Standard Time)

Good points @rougier! I'd like to emphasize the utility of communicating the choices (and the motivations behind them) made at the time of publication, even if they risk being distorted by hindsight. That's something we can only get out of authors doing reproductions of their own work. For example, I realized that I never preserved or published code for reproducibility, but only to make it available for reuse by others. As a consequence, I am always missing the last small steps: command-line arguments, that five-line script that ties computations together, etc.

Konrad Hinsen · Answer 7 · Tue Nov 19 2019 15:19:25 GMT+0800 (China Standard Time)

This discussion really belongs in ReScience/ten-years#4.

Nicolas P. Rougier · Answer 8 · Tue Nov 19 2019 16:28:16 GMT+0800 (China Standard Time)

Yes, and we should start a author-instructions.md document. @mcbaneg Feel free to close the issue and let's continue discussion at ReScience/ten-years#4.

George McBane · Answer 9 · Sun Nov 24 2019 04:57:33 GMT+0800 (China Standard Time)

Okay, I have submitted a paper for the Ten Years challenge (3882888). I'm comfortable with it serving as trial case for working out both (1) what you hope for in a paper for this challenge, and (2) what submission conventions you want to use.

Two technical points:

I tried to drag-and-drop the metadata.yaml file into the submission issue box, but was given a "we do not accept that file type" message. It felt silly to retype into the box several bits of information that I had already put into the .yaml file.
I feel like your expectations for what tools authors will have immediately available are high enough to discourage some potential authors. They include:

make
LaTeX, including latexmk
Python 3.x with PyYAML (2.x can run PyYAML but its treatment of locales is different and it doesn't work with the template makefile for that reason)
perl (necessary for latexmk)

Many potential authors will be familiar with all these tools, but may well not have them installed on the computers they normally use for preparing publications. I am familiar with them all, and use some (including make and LaTeX) daily. However, I had to install latexmk, perl, Python 3 (I had 2.7, and spent the time to find out it didn't work for this purpose), and PyYAML on my laptop to complete the submission.

Nicolas P. Rougier · Answer 10 · Mon Nov 25 2019 15:11:21 GMT+0800 (China Standard Time)

Thanks !

Yes, good point about the large set of tools needed just to compile. Note that there's an overleaf template (that may need to be updated) that simplify things. Only problem is that you still need to generate the yaml file. We can also have a simple web-form to create the yaml file but I'm a bit clueless on how to do that.

For the metadata uploading, I'm not sure to get your point.

Konrad Hinsen · Answer 11 · Mon Nov 25 2019 17:10:09 GMT+0800 (China Standard Time)

@mcbaneg Thanks for the feedback on our submission procedure! We should probably simplify the required toolchain, ideally to the point that a TeXlive installation is sufficient to do everything. On the other hand, if that means reading YAML from TeX I'll probably change my mind!

@rougier If I understand our current setup correctly, make and latexmk are needed only for streamlining the PDF generation. We could perhaps provide a shell script that runs a worst-case scenario, assuming everything needs to be redone. But then, would a shell script work under Windows?

Nicolas P. Rougier · Answer 12 · Mon Nov 25 2019 21:45:55 GMT+0800 (China Standard Time)

Yes, make and latexmk are not really needed. You can xelatex/biber/xelatex/xelatex and you're done. The metadata yaml and latex file can also be filled manually in case of problems (no need to use the python script to create the latex metadata from the yaml metadata, it's just more conveninet (if it runs)).