Shell Scripts Produce Corrupt PDFs under Bash for Windows
tajmone opened this issue · comments
The .sh
script to build the PDF docs via asciidoctor-fopub don't work as expected under Bash for Windows, for they seem unable to locate the default template images, thus creating a PDF with no admonitions icons:
SEVERE: Image not found. URI: D:/local/path/to/asciidoctor-fopub/build/fopub/docbook/images/note.svg. (No context info available)
whereas the image mentioned in the error is actually there, with the correct path and filename!
It must be something related to how paths are handled in Bash for Windows vs real *nix bash, and how these interact with FOP.
We need to:
- Find a TEMPORARY FIX:
- Make the
.sh
scripts for PDF build detect if running under Git for Windows' Bash (viaMinGW
env var) and abort execution if so.
- Make the
- Find a REAL SOLUTION to make the
.sh
scripts work.- Delete all
.bat
scripts and keep only Bash scripts.
- Delete all
This is a rather annoying problem because if asciidoctor-fopub could run without problems in Bash for Windows we could get rid of the batch scripts and use only Bash scripts — the assumption being that all contributors to the project should have Git, and therefore a Bash also on Windows.
Having to maintain all scripts in two versions (Bash + batch) is not only a burden, but it could easily lead to having scripts out of synch, therefore I'm not happy about it. Also, Bash offers many useful tools which are not available on the CMD, many of which are useful for working on cross-platform repos (e.g. unix2dos, dos2unix, iconv, etc.).
The current solution of Bash scripts aborting when detecting that they are running under MinGW*
is far from ideal — just a safeguard.
@thoni56, any idea why asciidoctor-fopub fails under Bash for Win?
I've done some experimentation with this in Cygwin and Msys2 and get the same problem in both. Once I've run pdf_build.sh
once and have a manual.xml
I can explore fopub
problems. I'm dumping some experiments and observations here.
Running fopub
from asciidoctor-fopub directory
When I run the following command from the directory of asciidoctor-fopub
$ ./fopub ../Alan/alan-docs/manual/manual.xml
I get some missing images, but there is also
Cannot read configuration file:///home/Thomas/Utveckling/asciidoctor-fopub/build/fopub/docbook-xsl/xslthl-config.xml: \home\Thomas\Utveckling\asciidoctor-fopub\build\fopub\docbook-xsl\xslthl-config.xml (Path cannot be found)
java.io.FileNotFoundException: \home\Thomas\Utveckling\asciidoctor-fopub\build\fopub\docbook-xsl\xslthl-config.xml (Path cannot be found)
...
My guess is that that means that fopub
uses the Windows Java (whatever you have) with Cygwin/Msys2 (and I presume Git Bash for Windows) paths. (\home\Thomas\Utveckling\asciidoctor-fopub\build\fopub\docbook-xsl\xslthl-config.xml
has been re-formatted as a Windows path but not remapped.)
But I can't understand who/where that reference to the config file is. If I knew that I could at least investigate what happens if that config file was read.
You have probably already seen that there is actually a manual.pdf
generated, but without images.
Running fopub
from directory of manual
Running
../../../asciidoctor-fopub/fopub manual.xml
I get the same Java exception, but the images are included (which is no big surprise).
Running pdf_build.sh
on Cygwin
When running
./pdf_build.sh
from the manual directory on Cygwin creates a completely different error:
USAGE
fop [options] [-fo|-xml] infile [-xsl file] [-awt|-pdf|-mif|-rtf|-tiff|-png|-pcl|-ps|-txt|-at [mime]|-print] <outfile>
[OPTIONS]
...
java.io.FileNotFoundException: Error: xml file C:\home\Thomas\Utveckling\Alan\alan-docs\manual\manual.xml not found
This led me to try to change the invocation of fopub
in the script to not move into the assets directory but instead reference the xsl config with a path, thus
fopub -t ../_assets/alan-xsl-fopub/xsl-fopub manual.xml
which kind-of improved things. Instead of the "USAGE" message I got another "File not found" exception:
org.apache.fop.apps.FOPException: javax.xml.transform.TransformerException: java.io.FileNotFoundException: C:\Users\Thomas\Utveckling\Alan\alan-docs\manual\db5.ent
for which the path looks ok (Windowsy enough) but the db5.ent
file is nowhere to be found. Ideas where to look?
(For Msys2 I get the same behaviour, provided I comment out the Bash for Windows check. I have not tried with actual Git Bash for Windows
)
Summary of findings
fopub
does seem to work under Cygwin/Msys2- There is a path problem when the
fopub\docbook-xsl\xslthl-config.xml
is to be read (by some Java code/program) - The
pdf_build.sh
seems to have a problem setting up for execution offopub
in Cygwin/Msys2 environments
Mhhhh. I suspected this was the problem. I've experienced something similar in the StdLib repo, with other dependencies. The best approach IMO is to:
- Extract all assets' absolute paths and store into Shell env-variables.
- Convert them to correct Bash or Windows paths as required.
- Invoke fopub with all parameters as absolute paths.
But, from what I remember, the main problem are the DocBook settings files, which don't work well with relative paths across multiple OSs. I remember having experienced some problems locating the fonts, and that it could only be done using Windows paths for some reasons.
So, we'll have to look at the settings too, because images and fonts are controlled by the template settings.
This is the reason why I was considering looking into Asciidoctor's native PDF backend, and check if the previous issues have been solved in the meantime (there were some problems with footnotes at the time, but they might be solved now). Because it's a Gem, it's going to be much easier to use it in the toolchain without headaches. Also, Java has given lot's of problems with fopub so far, with Gradle incompatibility issues (which were finally solved) and more.
Might Need to Sub-Module FoPub
Having compared your error reports with mine, and your comments on the different behaviours under CygWin and MSYS2, I think that the problem we have here is dual-fold:
- Bash vs Windows paths formatting.
- Assets look-up paths for:
- Configuration files
Probably the former might be fixed somehow, whereas the latter might require adding asciidoctor-fopub as a Git submodule into the repository, so that we can either pass some custom relative paths via command line options, or add some paths to the env $PATH
(which we can't do if everyone has located asciidoctor-fopub in an arbitrary folder on his local machine) — but then, it might just be a problem with Bash paths.
Adding asciidoctor-fopub as a Git submodule has some other advantages too, i.e. we can ensure that everyone is using the same exact version, in case the repository is updated (which doesn't happen often though, with the latest commit being from 2018).
The "Image not found" errors seem to be due to Bash vs Windows paths, since I get this error for an image that is actually there:
SEVERE: Image not found. URI: D:/absolute-path-to/asciidoctor-fopub/build/fopub/docbook/images/tip.svg. (No context info available)
As for the config file error, which related to the Git submodule of our template, the paths are also resolved correct, but one is being passed (or just reported?) using the file://
protocol, the second one as a Bash formatted path, the latter formatting being most likely the culprit of the error (the two paths in the message point to the same file):
SEVERE: Cannot read configuration file:///d/absolute-path-to/AlanDocs/alan-docs/_assets/alan-xsl-fopub/xsl-fopub/xslthl-config.xml: \d\absolute-path-to\AlanDocs\alan-docs\_assets\alan-xsl-fopub\xsl-fopub\xslthl-config.xml (The system cannot find the path specified)
It's strange that the file://
protocol fails, since it's universal. Problem the protocol is used only in the error report, whereas Java is trying to locate the config file using the Bash path which is receiving from Bash for Windows.
for which the path looks ok (Windowsy enough) but the
db5.ent
file is nowhere to be found. Ideas where to look?
It's looking in the wrong path, the db5.ent
file is part of the asciidoctor-fopub package, not the Manual source folder — it's located in:
asciidoctor-fopub\build\fopub\db5.ent
Yes, I'm aware that CygWin and MSYS2 do some path sanitation in the background, on the fly, to ensure that paths are handled properly; but Java isn't so smart (another reason in the long list of reasons why I hate Java and its portability myths).
Having to keep dual scripts (batch and shell) is really an unnecessary pain.
I should really check whether the official Asciidoctor PDF backend has been updated and solved all those old problems that were preventing us from using it (a couple of years have passed since). The main problem with switching to Asciidoctor PDF is that we'll need to implement an ALAN syntax for the Rogue highlighter in order to support syntax highlighting — but then, this would solve other issues too, since a Rouge syntax would support call-outs, and can be used in the HTML backend tool.
Rouge is definitely the best highlighter choice since it's in Ruby and supported by all the official Asciidoctor backends. It's similar to Pygments (Python), and it's powerful because it uses a state stack that allows contextual operations on the syntax. I'll look into it, it doesn't seem to hard to do, it's just that I don't know Ruby that well, and I'm not sure if and how a custom syntax can be integrated into Rouge without submitting it to the upstream repository.