showyourwork / showyourwork

A workflow for reproducible and open scientific articles

Home Page:https://show-your.work

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Demonstrate workflow for removing latex comments from submission tarball

MilesCranmer opened this issue · comments

When I submit to arXiv I like to remove all comments from my paper. Partly in case I say something silly and forget to remove it, lest I end up on astro-ph leaks, but it's also a security issue - perhaps someone pastes in a token or password and forgets to remove it. This happens more frequently with LaTeX since people usually only review the final rendered PDF, rather than the source code itself like for other languages.

Now, you can do this comment removal programmatically with (h/t https://tex.stackexchange.com/a/271460/140440):

latexpand --empty-comments main.tex > main-clean.tex

Since latexpand is already downloaded in the GitHub action, it should be relatively straightforward to add this. I guess the question is whether it should be opt-in or opt-out.

I would vote for opt-out, so that comment removal is done automatically, just so we help protect people's private comments and potentially their security (really, arXiv should do this, but they leave the source file untouched - I guess it's tricky to do generally). I think quite a few people don't realize a paper's source code can be downloaded with all the comments that they forgot to remove.

Now, of course, you may say: "the paper would also be on GitHub"! So why would this be needed at all?

  1. You can always force push and overwrite a GitHub repo, to permanently remove comments if you need to (e.g., say you pushed a security token or password by accident). But once on arXiv, it's always on arXiv! They are very very against removing any version of a paper once it's up (one time a coauthor uploaded with a license incompatible to the journal, and it was a nightmare to remove the early version). So better safe than sorry.
  2. Even though it's against the core ideals of the project, some people will use SYW only locally or in a private repo, since it's a nice build system with integrated figure/tex generation. In that setting you are probably less likely to pay attention to the comments in a paper, and uploading source to arXiv could backfire.

Interested to hear what others think!

I definitely don't think this should be part of SYW, but it should presumably be possible to customize the arXiv generation rules and this could be a nice case study to document!

I definitely don't think this should be part of SYW

Care to share why? SYW already generates the .bbl file which is the minified version of the bib; it seems natural to also generate a minified latex file.

That's a required step for submitting to arXiv. I don't agree that removing comments is a required (or even wanted!) step for arXiv submission. I definitely don't see any reason why it should be built in besides feature creep!

But don't get me wrong, I think documenting how to do this would be great!!

That's a required step for submitting to arXiv. I don't agree that removing comments is a required (or even wanted!) step for arXiv submission. I definitely don't see any reason why it should be built in besides feature creep!

I see your point about the .bbl file; that example wasn't good. I'm not sure I see the worry about feature bloat as this preprocessing would be a one-line change (latexpand is already included), and would improve security/privacy of users without them having to do anything or even noticing it.

I think in general it makes sense to opt-out of security/privacy features rather than to opt-in. The newbie users are the people you would actually want to help with stuff like this, as they are the ones who might not realize arXiv source is visible (and also might not read through the entire docs).

(By the way, I am curious: do you or others want to/expect to upload their latex comments to arxiv? Maybe some people document their latex source, expecting others to read it. I could see this makes sense for documenting complex Tikz diagrams where others might want to know how it was generated?)

In general though, if you feel strongly about this, I am quite happy with it simply being possible via a manually-configured preprocessing step with documentation!

Superseded by #356