KWARC / bibs

The joint bibliographies of the KWARC group. Automatically built by travis.

Home Page:https://kwarc.github.io/bibs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

make a LaTeXML-based publication generation process

kohlhase opened this issue · comments

If we had that, we could retire the old process and move from the SVN to git. We want to get rid of SVN, and this is one of the places where we still have it.
@tkw1536 @jbs1

I have experimented with this a bit, and I have something that almost works, but is still quite brittle.

  1. ideally using the Makefile (but that does not work now, since latexmlc segfaults so I use the script generate-pubwww) generate files like mkohlhase-article.{tex,html} which have mkohlhase's publications of type article. mkohlahse-article.tex is generated by the XSLT sytyle sheet mybib.xsl, and mkohlhase-article.html from that via latexml.
  2. gather up files mkohlhase-*.html into a file pubs-mkohlhase.html that has them all and some more infrastructure around them using the xslt file
    So far so good, but
  3. it takes a long time, and most files have not really changed, and therefore need not really be re-generated.
  4. the result should be beautified a bit.
  5. there is an encoding problem introduced in step 2. That should not happen, since it is really only XSLT.
    Once these things are ironed out, I could imagine to run this via a post-commit hook, and commit the generated files in a separate branch, so that we can reference them from the kwarc home page.

Oh, and there are some things wrong with LaTeXML that I have not figured out. See the brucemiller/LaTeXML#764 ff.

  1. there is an encoding problem introduced in step 2. That should not happen, since it is really only XSLT.

actually, that is not true, my browser just guessed the wrong one, very annoying.

@kohlhase I could very well imagine not running this via a post-commit hook, but instead via Travis. It is also capable of committing back into the repository (for example to a gh-pages branch, or directly).

Furthremore I would like to do cleanup of files and split it up into multiple directories, something like:

  • /src/ --> Source .bib files
  • /lib/ --> All files related to building (the generate-pubwww, the xslts, the stys etc)
  • /dist/ --> kwarc.bib file
  • /dist/html/ --> outputted html page, ready for deployment. Maybe we can not commit this on the master branch but to gh-pages instead. It could then be automatically be published and we could just link to it from our website.
  • Toplevel with only a README, the Makefile, a file for travis configuration and a .gitignore

@kohlhase I could very well imagine not running this via a post-commit hook, but instead via Travis. It is also capable of committing back into the repository (for example to a gh-pages branch, or directly).

a very good. I was wondering whether that would be possible. We should pursue this. I like the idea of using the gh-pages branch.

Furthremore I would like to do cleanup of files and split it up into multiple directories, something like:

in principle I agree with this, but we should be very careful. The KWARC bibs have two functions;

  1. to be integrated into paper repositories
  2. to generate our publication pages.
    For 1. I would like the repos to be minimal: only what you want to put under src and dist, but I do not like the idea to have a level of subdirs there. The rest is needed for 2. So I guess that should go into a different repos or a different branch.
    Could you make a specific proposal?

OK, I have a much better publication list generation (there are some things LaTeXML has to improve though). But the result is in many ways already better than the old system. So I would be very interested to (have you) pursue the deployment.
@tkw1536 could you please

  1. try to run the generate-pubwww script on a linux machine and see whether the generations segfault? They do on the mac (see brucemiller/LaTeXML#764)
  2. could you have a look at the script generate-pubwww and improve it? It seems to me that if none of the files <pid>-<type>.tex has changed then we do not need to generate the files <pid>-pubs.html. And we only have to generate the <pid>-<type>.html files where the corresponding LaTeX file has changed.
  3. Finally, if neither of the files kwarcpubs.bib nor kwarccrossrefs.bib nor extcrossrefs.bib have changed nothing has to be done at all.
    This should speed up things considerably.

Ok, I will take a look at it later today.

On Wed, Jun 15, 2016, 12:31 Michael Kohlhase notifications@github.com
wrote:

OK, I have a much better publication list generation (there are some
things LaTeXML has to improve though). But the result is in many ways
already better than the old system. So I would be very interested to (have
you) pursue the deployment.
@tkw1536 https://github.com/tkw1536 could you please

  1. try to run the generate-pubwww script on a linux machine and see
    whether the generations segfault? They do on the mac (see
    brucemiller/LaTeXML#764
    brucemiller/LaTeXML#764)
  2. could you have a look at the script generate-pubwww and improve it? It
    seems to me that if none of the files -.tex has changed then
    we do not need to generate the files -pubs.html. And we only have to
    generate the -.html files where the corresponding LaTeX file
    has changed.
  3. Finally, if neither of the files kwarcpubs.bib nor kwarccrossrefs.bib
    nor extcrossrefs.bib have changed nothing has to be done at all.
    This should speed up things considerably.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ACya6kz-Y6bpWCunind9eB9Lw-7xYKNBks5qL9RzgaJpZM4I1bF_
.

@kohlhase I have updated the generation process with a new Makefile and committed it to a new auto branch. The reason I did not commit it to the svn is because it will break the old post-commit hook. This change includes a new folder structure. The Makefile has the following targets:

  • all = dist
  • dist = bib pubs
  • bib Takes the individual .bib files and concatenates them into `dist/kwarc.bib``
  • xml Takes the individual .bib files and generates xml versions of them in dist/ltxml/*.bib.xml using latexml. The generated files are .gitignored as users should not need them.
  • html Takes the generated xml files from above and uses latexml and xslt to first generate .tex files in dist/tex/name-type.tex and then html files dist/html/name-type.html. Both types are gitignored. Uses an adapted version of the generate-pubwww script, now found in src/html/generate.html.
  • pubs Takes the generated html files and builds a nice-looking bibliography in dist/pubs. The output is .gitignored and intended to be committed to a gh-pages branch later on. Although that would still need an index.html, but that should not be a problem.
  • clean-bib, clean-xml, clean-html, clean-pubsRemoves files generated by an individual target.
  • clean Removes all generated files

A make all command runs fine on my mac and takes about 11 minutes to complete, I have not yet tested on linux. The next step for me includes writing a .travis.yml file that automatically runs make bib pubs and commits accordingly. Also, could you take a look at the files that are still in the root folder and if we still need them. In particular these are:

  • kwarcnocites.tex, kwarcpubs.tex, pubs.tex These seem to belong to the attempt of a manual process of the make html target above.
  • deprecated.bib -- Do we still need these?
  • warning-kwarc.bib --- Sounds like something that should go into the preamble.bib, but the old post-commit hook didn't include it. Should we do this now?
  • stjohann.bib -- Can we just integrate this into src/bib/... ?

the automatic generation process seems to work now. Now we need to update LaTeXML from time to time (there are updates triggered by our process).

It actually uses the latest LaTeXML from cpanm, see https://github.com/KWARC/bibs/blob/master/src/travis/deploy.sh#L78-L80

so that installs every time?

Yes, that is the nature of travis.

wonderful.