make a LaTeXML-based publication generation process
kohlhase opened this issue · comments
I have experimented with this a bit, and I have something that almost works, but is still quite brittle.
- ideally using the
Makefile
(but that does not work now, sincelatexmlc
segfaults so I use the scriptgenerate-pubwww
) generate files likemkohlhase-article.{tex,html}
which havemkohlhase
's publications of typearticle
.mkohlahse-article.tex
is generated by the XSLT sytyle sheetmybib.xsl
, andmkohlhase-article.html
from that via latexml. - gather up files
mkohlhase-*.html
into a filepubs-mkohlhase.html
that has them all and some more infrastructure around them using the xslt file
So far so good, but - it takes a long time, and most files have not really changed, and therefore need not really be re-generated.
- the result should be beautified a bit.
- there is an encoding problem introduced in step 2. That should not happen, since it is really only XSLT.
Once these things are ironed out, I could imagine to run this via a post-commit hook, and commit the generated files in a separate branch, so that we can reference them from the kwarc home page.
Oh, and there are some things wrong with LaTeXML that I have not figured out. See the brucemiller/LaTeXML#764 ff.
- there is an encoding problem introduced in step 2. That should not happen, since it is really only XSLT.
actually, that is not true, my browser just guessed the wrong one, very annoying.
@kohlhase I could very well imagine not running this via a post-commit hook, but instead via Travis. It is also capable of committing back into the repository (for example to a gh-pages branch, or directly).
Furthremore I would like to do cleanup of files and split it up into multiple directories, something like:
/src/
--> Source .bib files/lib/
--> All files related to building (thegenerate-pubwww
, the xslts, the stys etc)/dist/
--> kwarc.bib file/dist/html/
--> outputted html page, ready for deployment. Maybe we can not commit this on the master branch but to gh-pages instead. It could then be automatically be published and we could just link to it from our website.- Toplevel with only a README, the Makefile, a file for travis configuration and a .gitignore
@kohlhase I could very well imagine not running this via a post-commit hook, but instead via Travis. It is also capable of committing back into the repository (for example to a gh-pages branch, or directly).
a very good. I was wondering whether that would be possible. We should pursue this. I like the idea of using the gh-pages branch.
Furthremore I would like to do cleanup of files and split it up into multiple directories, something like:
in principle I agree with this, but we should be very careful. The KWARC bibs have two functions;
- to be integrated into paper repositories
- to generate our publication pages.
For 1. I would like the repos to be minimal: only what you want to put under src and dist, but I do not like the idea to have a level of subdirs there. The rest is needed for 2. So I guess that should go into a different repos or a different branch.
Could you make a specific proposal?
OK, I have a much better publication list generation (there are some things LaTeXML has to improve though). But the result is in many ways already better than the old system. So I would be very interested to (have you) pursue the deployment.
@tkw1536 could you please
- try to run the
generate-pubwww
script on a linux machine and see whether the generations segfault? They do on the mac (see brucemiller/LaTeXML#764) - could you have a look at the script
generate-pubwww
and improve it? It seems to me that if none of the files<pid>-<type>.tex
has changed then we do not need to generate the files<pid>-pubs.html
. And we only have to generate the<pid>-<type>.html
files where the corresponding LaTeX file has changed. - Finally, if neither of the files
kwarcpubs.bib
norkwarccrossrefs.bib
norextcrossrefs.bib
have changed nothing has to be done at all.
This should speed up things considerably.
Ok, I will take a look at it later today.
On Wed, Jun 15, 2016, 12:31 Michael Kohlhase notifications@github.com
wrote:
OK, I have a much better publication list generation (there are some
things LaTeXML has to improve though). But the result is in many ways
already better than the old system. So I would be very interested to (have
you) pursue the deployment.
@tkw1536 https://github.com/tkw1536 could you please
- try to run the generate-pubwww script on a linux machine and see
whether the generations segfault? They do on the mac (see
brucemiller/LaTeXML#764
brucemiller/LaTeXML#764)- could you have a look at the script generate-pubwww and improve it? It
seems to me that if none of the files -.tex has changed then
we do not need to generate the files -pubs.html. And we only have to
generate the -.html files where the corresponding LaTeX file
has changed.- Finally, if neither of the files kwarcpubs.bib nor kwarccrossrefs.bib
nor extcrossrefs.bib have changed nothing has to be done at all.
This should speed up things considerably.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ACya6kz-Y6bpWCunind9eB9Lw-7xYKNBks5qL9RzgaJpZM4I1bF_
.
@kohlhase I have updated the generation process with a new Makefile and committed it to a new auto branch. The reason I did not commit it to the svn is because it will break the old post-commit hook. This change includes a new folder structure. The Makefile has the following targets:
all = dist
dist = bib pubs
bib
Takes the individual .bib files and concatenates them into `dist/kwarc.bib``xml
Takes the individual .bib files and generates xml versions of them indist/ltxml/*.bib.xml
using latexml. The generated files are.gitignored
as users should not need them.html
Takes the generated xml files from above and uses latexml and xslt to first generate .tex files indist/tex/name-type.tex
and then html filesdist/html/name-type.html
. Both types are gitignored. Uses an adapted version of thegenerate-pubwww
script, now found insrc/html/generate.html
.pubs
Takes the generated html files and builds a nice-looking bibliography indist/pubs
. The output is .gitignored and intended to be committed to a gh-pages branch later on. Although that would still need an index.html, but that should not be a problem.clean-bib
,clean-xml
,clean-html
,clean-pubs
Removes files generated by an individual target.clean
Removes all generated files
A make all
command runs fine on my mac and takes about 11 minutes to complete, I have not yet tested on linux. The next step for me includes writing a .travis.yml file that automatically runs make bib pubs
and commits accordingly. Also, could you take a look at the files that are still in the root folder and if we still need them. In particular these are:
kwarcnocites.tex, kwarcpubs.tex, pubs.tex
These seem to belong to the attempt of a manual process of themake html
target above.deprecated.bib
-- Do we still need these?warning-kwarc.bib
--- Sounds like something that should go into the preamble.bib, but the old post-commit hook didn't include it. Should we do this now?stjohann.bib
-- Can we just integrate this intosrc/bib/...
?
the automatic generation process seems to work now. Now we need to update LaTeXML from time to time (there are updates triggered by our process).
It actually uses the latest LaTeXML from cpanm, see https://github.com/KWARC/bibs/blob/master/src/travis/deploy.sh#L78-L80
so that installs every time?
Yes, that is the nature of travis.
wonderful.