Zero Width Space in input text crashes the program
LdBeth opened this issue · comments
Zero width space (ZWSP) ​
or ​
, either occurs as UTF-8 character or as XML entity in the input, would crash Speedata when ran with default flags, leaving a cryptic error message:
$ sp
...
> Shipout page 1
Page of type "page" created (2)
Number of rows: 28, number of columns = 19
PlaceObject: Textblock at (1,1) wd/ht: 1/0 in "_page" (p. 2)
PlaceObject: Textblock at (10,1) wd/ht: 10/1 in "_page" (p. 2)
PlaceObject: Textblock at (1,2) wd/ht: 19/2 in "_page" (p. 2)
Selecting node: "entry", mode="", pos=14
PlaceObject: Textblock at (1,4) wd/ht: 19/2 in "_page" (p. 2)
Selecting node: "entry", mode="", pos=15
Total run time: 765.678525ms
signal: abort trap
This is tested with 4.14.0 release and also developer version 4.15.19 on Intel based macOS.
I noticed the Speedata manual listed ​
as one of the space characters interpreted, and seems other unicode space characters does not cause the problem, so this is likely a bug.
Thank you very much!
For me: here is a layout that crashes
<Layout xmlns="urn:speedata.de:2009/publisher/en"
xmlns:sd="urn:speedata:2009/publisher/functions/en">
<Record element="data">
<PlaceObject>
<Textblock>
<Paragraph>
<Value>​ text</Value>
</Paragraph>
</Textblock>
</PlaceObject>
</Record>
</Layout>
(4.15.19, sp --dummy)
@LdBeth I am not sure that I am able to fix the error without more help from you.
- do you use harfbuzz mode when loading the fonts (a global switch in the configuration file or
mode="harfbuzz"
with<LoadFont...>
? - could you provide a small layout file that shows the problem?
I have a fix for a problem I have constructed above, but I am not sure (different error message) that this will also fix your error.
do you use harfbuzz mode when loading the fonts (a global switch in the configuration file or mode="harfbuzz" with <LoadFont...>?
No, it is a different issue from the problem here but It seems harfbuzz mode cannot be use together with font fallback. While in the layout file I used when I discover this issue I relied on font fallback to handle English text mixed with Japanese.
<LoadFontfile name="Sans" filename="IBMPlexSerif-Regular.ttf">
<Fallback filename="KleeOne-Regular.ttf" />
</LoadFontfile>
Actually, the problem cannot be reproduced with harfbuzz mode on.
<Layout
xmlns="urn:speedata.de:2009/publisher/en"
xmlns:db='http://docbook.org/ns/docbook'
xmlns:sd="urn:speedata:2009/publisher/functions/en">
<Options mainlanguage="en"/>
<LoadFontfile name="Sans" filename="KleeOne-Regular.ttf"/> <!-- with mode="harfbuzz" the crash won't happen -->
<DefineFontfamily name="sans" fontsize="9" leading="11">
<Regular fontface="Sans"/>
</DefineFontfamily>
<Hyphenation>Gun-dam</Hyphenation>
<Hyphenation>as–sas–sin</Hyphenation>
<DefineTextformat name="title" break-below="no"
alignment="leftaligned"/>
<DefineTextformat name="yr" break-below="no"
alignment="rightaligned"/>
<DefineTextformat name="desc" alignment="leftaligned"
indentation="0.2cm"/>
<Pagetype name="page" test="true()">
<Margin left="1cm" right="1cm" top="1cm" bottom="1cm"/>
<AtPageCreation>
<PlaceObject column="1" row="1">
<Textblock><Copy-of select="$header"/></Textblock>
</PlaceObject>
<PlaceObject column="10" row="1">
<Textblock>
<Paragraph>
<Value>Page: </Value>
<Value select="sd:current-page()"/>
</Paragraph>
</Textblock>
</PlaceObject>
</AtPageCreation>
<PositioningArea name="text">
<PositioningFrame
width="9"
height="{(sd:number-of-rows() div 2) - 2}"
row="2"
column="1"/>
<PositioningFrame
width="9"
height="{(sd:number-of-rows() div 2) - 2}"
row="2"
column="11"/>
<PositioningFrame
width="9"
height="{(sd:number-of-rows() div 2) - 1}"
row="{(sd:number-of-rows() div 2) + 2}"
column="1"/>
<PositioningFrame
width="9"
height="{(sd:number-of-rows() div 2) - 1}"
row="{(sd:number-of-rows() div 2) + 1}"
column="11"/>
</PositioningArea>
</Pagetype>
<Record element="document">
<SetVariable variable="header">
<Paragraph><Value select="header"/></Paragraph>
</SetVariable>
<ProcessNode select="*"/>
</Record>
<Record element="entry">
<Output area="text">
<Text>
<Paragraph language="--"
fontfamily="sans"
textformat="title"><Value select="title"/></Paragraph>
<Paragraph textformat="yr"><Value>(</Value>
<Value select="year"/><Value>)</Value></Paragraph>
</Text>
</Output>
</Record>
</Layout>
and
data.xml
<document>
<entry>
<title>Chainsaw​ Man</title>
<year>2022</year>
</entry>
</document>
Also the font file seems unrelated to the problem so you can replace them with the files available on your system.
Thank you very much!
For me: here is a layout that crashes
<Layout xmlns="urn:speedata.de:2009/publisher/en" xmlns:sd="urn:speedata:2009/publisher/functions/en"> <Record element="data"> <PlaceObject> <Textblock> <Paragraph> <Value>​ text</Value> </Paragraph> </Textblock> </PlaceObject> </Record> </Layout>
(4.15.19, sp --dummy)
I cannot reproduce the program crash with this example on 4.15.19, however I found the issue to be using sp --dummy
and following layout file, the program exits without indication of error, but the output pdf only contains the "text" after ZWSP, the "my a" before are missing.
<Layout xmlns="urn:speedata.de:2009/publisher/en"
xmlns:sd="urn:speedata:2009/publisher/functions/en">
<Options mainlanguage="en"/>
<Record element="data">
<PlaceObject>
<Textblock>
<Paragraph>
<Value>my a​text</Value>
</Paragraph>
</Textblock>
</PlaceObject>
</Record>
</Layout>
Also I would like to confirm an unexpected behavior, when the file directory is like
layout.xml
data.xml
foo/data.xml
The file foo/data.xml
is loaded instead of data.xml
. Which I believe is an edge case not handled in the code.
do you use harfbuzz mode when loading the fonts (a global switch in the configuration file or mode="harfbuzz" with <LoadFont...>?
No, it is a different issue from the problem here but It seems harfbuzz mode cannot be use together with font fallback.
... thank you very much, I can reproduce the problem and I will provide a fix.
Also I would like to confirm an unexpected behavior
...
It would help me organzing stuff if this is opened in a different issue. That said, this is "expected", although not well documented: https://doc.speedata.de/publisher/en/basics/fileorganization/#ch-fileorganization
dupicate entries should give a better warning.
Minimal layout:
<Layout
xmlns="urn:speedata.de:2009/publisher/en"
xmlns:sd="urn:speedata:2009/publisher/functions/en">
<Record element="data">
<PlaceObject>
<Textblock>
<Paragraph>
<Value select="title" />
</Paragraph>
</Textblock>
</PlaceObject>
</Record>
</Layout>
data:
<data>
<title>a​ b</title>
</data>
Yes, I can confirm the minimal layout reproduces the same problem I have.
A workaround (until I provide a fix) is to say html="off"
with Paragraph:
<Paragraph html="off">
This should be fixed in version 4.15.20 (now online). Thank you very much for your bug report and your patience!