Post-processing rules for arXiv tarball?
MilesCranmer opened this issue · comments
Is there an expected way to write post-processing rules for generating the arXiv tarball? Maybe a way I can hack the specific Snakemake rule that is being triggered?
Basically I would like to run a few commands whenever the arXiv tarball is generated:
- Run
latexpand
to remove comments andde-macro
to expand all macros. Both of these make it easier for scraping tools like Google to index your paper, as well as humans to read the source. - Store my
pygment
cache and convert\usepackage{minted}
to\usepackage[frozen]{minted}
, which makes it compatible with arXiv where you can't useshell-escape
.
Perhaps rather than having these implemented internally it would be great if I could just write some bash commands at the arXiv tarball stage. There's probably other packages too that require custom post-processing before putting up on arXiv.
(Any idea if this is possible @dfm?)
See also: #236
In the current version of showyourwork
the arxiv tarball workflow is all pretty hard-coded (see here and here). One option (probably the lowest friction) would be to write a custom Snakemake rule that takes arxiv.tar.gz
as input and produces arxiv-clean.tar.gz
(or something) as output after running these steps. Alternatively we could add a hook somewhere deeper to automatically clean ms.tex
before compilation, to make sure that it doesn't break anything. That will be easier on the dev
branch, so I'm somewhat less inclined to work too hard on it here!
Oh, right, I remember that issue! I guess this issue supersedes it because this one is about a more general post-processing step. I'll close the old one.
Sounds good to me.
Regarding potential user-interface, I wonder if it could be useful to enable a general preprocessing step, since ms.tex
is being copied to the .showyourwork/compiled
folder anyways. Then, you could expose a variable indicating that it is being about to be copied into the arXiv build or not. (Or, at least let a user override the default copy?)
Yes - what I was trying to say is that that interface is explicitly exposed to Snakemake on the dev
branch, but unfortunately here all the copying is done in Python, so it would be somewhat invasive to override the copy of ms.tex
this on the main
branch. That being said, it would certainly be possible if you're keen!
@dfm I wonder if specifying ruleorder
is all that is needed for pre-processing? e.g.,
rule my_custom_rule:
input:
".showyourwork/compile/ms.tex"
output:
".showyourwork/compile/ms.tex"
script: "my_crazy_preprocessing.sh"
ruleorder: my_custom_rule > syw__arxiv
(Nevermind, lol)