showyourwork / showyourwork

A workflow for reproducible and open scientific articles

Home Page:https://show-your.work

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Post-processing rules for arXiv tarball?

MilesCranmer opened this issue · comments

Is there an expected way to write post-processing rules for generating the arXiv tarball? Maybe a way I can hack the specific Snakemake rule that is being triggered?

Basically I would like to run a few commands whenever the arXiv tarball is generated:

  1. Run latexpand to remove comments and de-macro to expand all macros. Both of these make it easier for scraping tools like Google to index your paper, as well as humans to read the source.
  2. Store my pygment cache and convert \usepackage{minted} to \usepackage[frozen]{minted}, which makes it compatible with arXiv where you can't use shell-escape.

Perhaps rather than having these implemented internally it would be great if I could just write some bash commands at the arXiv tarball stage. There's probably other packages too that require custom post-processing before putting up on arXiv.

(Any idea if this is possible @dfm?)

See also: #236

In the current version of showyourwork the arxiv tarball workflow is all pretty hard-coded (see here and here). One option (probably the lowest friction) would be to write a custom Snakemake rule that takes arxiv.tar.gz as input and produces arxiv-clean.tar.gz (or something) as output after running these steps. Alternatively we could add a hook somewhere deeper to automatically clean ms.tex before compilation, to make sure that it doesn't break anything. That will be easier on the dev branch, so I'm somewhat less inclined to work too hard on it here!

Oh, right, I remember that issue! I guess this issue supersedes it because this one is about a more general post-processing step. I'll close the old one.

Sounds good to me.

Regarding potential user-interface, I wonder if it could be useful to enable a general preprocessing step, since ms.tex is being copied to the .showyourwork/compiled folder anyways. Then, you could expose a variable indicating that it is being about to be copied into the arXiv build or not. (Or, at least let a user override the default copy?)

Yes - what I was trying to say is that that interface is explicitly exposed to Snakemake on the dev branch, but unfortunately here all the copying is done in Python, so it would be somewhat invasive to override the copy of ms.tex this on the main branch. That being said, it would certainly be possible if you're keen!

@dfm I wonder if specifying ruleorder is all that is needed for pre-processing? e.g.,

rule my_custom_rule:
    input:
        ".showyourwork/compile/ms.tex"
    output:
        ".showyourwork/compile/ms.tex"
    script: "my_crazy_preprocessing.sh"

ruleorder: my_custom_rule > syw__arxiv

(Nevermind, lol)