speedata / publisher

speedata Publisher - a professional database Publishing system

Home Page:https://www.speedata.de/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Zero Width Space in input text crashes the program

LdBeth opened this issue · comments

Zero width space (ZWSP) ​ or ​, either occurs as UTF-8 character or as XML entity in the input, would crash Speedata when ran with default flags, leaving a cryptic error message:

$ sp
...
> Shipout page 1
Page of type "page" created (2)
Number of rows: 28, number of columns = 19
PlaceObject: Textblock at (1,1) wd/ht: 1/0 in "_page" (p. 2)
PlaceObject: Textblock at (10,1) wd/ht: 10/1 in "_page" (p. 2)
PlaceObject: Textblock at (1,2) wd/ht: 19/2 in "_page" (p. 2)
Selecting node: "entry", mode="", pos=14
PlaceObject: Textblock at (1,4) wd/ht: 19/2 in "_page" (p. 2)
Selecting node: "entry", mode="", pos=15
Total run time: 765.678525ms
signal: abort trap

This is tested with 4.14.0 release and also developer version 4.15.19 on Intel based macOS.

I noticed the Speedata manual listed ​ as one of the space characters interpreted, and seems other unicode space characters does not cause the problem, so this is likely a bug.

Thank you very much!

For me: here is a layout that crashes

<Layout xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:sd="urn:speedata:2009/publisher/functions/en">

    <Record element="data">
        <PlaceObject>
            <Textblock>
                <Paragraph>
                    <Value>&#8203; text</Value>
                </Paragraph>
            </Textblock>
        </PlaceObject>
    </Record>
</Layout>

(4.15.19, sp --dummy)

@LdBeth I am not sure that I am able to fix the error without more help from you.

  1. do you use harfbuzz mode when loading the fonts (a global switch in the configuration file or mode="harfbuzz" with <LoadFont...>?
  2. could you provide a small layout file that shows the problem?

I have a fix for a problem I have constructed above, but I am not sure (different error message) that this will also fix your error.

do you use harfbuzz mode when loading the fonts (a global switch in the configuration file or mode="harfbuzz" with <LoadFont...>?

No, it is a different issue from the problem here but It seems harfbuzz mode cannot be use together with font fallback. While in the layout file I used when I discover this issue I relied on font fallback to handle English text mixed with Japanese.

  <LoadFontfile name="Sans" filename="IBMPlexSerif-Regular.ttf">
    <Fallback filename="KleeOne-Regular.ttf" />
  </LoadFontfile>

Actually, the problem cannot be reproduced with harfbuzz mode on.

<Layout
    xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:db='http://docbook.org/ns/docbook'
    xmlns:sd="urn:speedata:2009/publisher/functions/en">
  <Options mainlanguage="en"/>
  <LoadFontfile name="Sans" filename="KleeOne-Regular.ttf"/> <!-- with mode="harfbuzz" the crash won't happen -->
  <DefineFontfamily name="sans" fontsize="9" leading="11">
    <Regular fontface="Sans"/>
  </DefineFontfamily>
  <Hyphenation>Gun-dam</Hyphenation>
  <Hyphenation>as–sas–sin</Hyphenation>
  <DefineTextformat name="title" break-below="no"
                    alignment="leftaligned"/>
  <DefineTextformat name="yr" break-below="no"
                    alignment="rightaligned"/>
  <DefineTextformat name="desc" alignment="leftaligned"
                    indentation="0.2cm"/>
  <Pagetype name="page" test="true()">
    <Margin left="1cm" right="1cm" top="1cm" bottom="1cm"/>
    <AtPageCreation>
      <PlaceObject column="1" row="1">
        <Textblock><Copy-of select="$header"/></Textblock>
      </PlaceObject>
      <PlaceObject column="10" row="1">
        <Textblock>
          <Paragraph>
            <Value>Page: </Value>
            <Value select="sd:current-page()"/>
          </Paragraph>
        </Textblock>
      </PlaceObject>
    </AtPageCreation>
    <PositioningArea name="text">
      <PositioningFrame
          width="9"
          height="{(sd:number-of-rows() div 2) - 2}"
          row="2"
          column="1"/>
      <PositioningFrame
          width="9"
          height="{(sd:number-of-rows() div 2) - 2}"
          row="2"
          column="11"/>
      <PositioningFrame
          width="9"
          height="{(sd:number-of-rows() div 2) - 1}"
          row="{(sd:number-of-rows() div 2) + 2}"
          column="1"/>
      <PositioningFrame
          width="9"
          height="{(sd:number-of-rows() div 2) - 1}"
          row="{(sd:number-of-rows() div 2) + 1}"
          column="11"/>
    </PositioningArea>
  </Pagetype>

  <Record element="document">
    <SetVariable variable="header">
      <Paragraph><Value select="header"/></Paragraph>
    </SetVariable>
    <ProcessNode select="*"/>
  </Record>

  <Record element="entry">
    <Output area="text">
      <Text>
        <Paragraph language="--"
                   fontfamily="sans"
                   textformat="title"><Value select="title"/></Paragraph>
        <Paragraph textformat="yr"><Value>(</Value>
        <Value select="year"/><Value>)</Value></Paragraph>
      </Text>
    </Output>
  </Record>

</Layout>

and

data.xml

<document>
   <entry>
      <title>Chainsaw&#8203; Man</title>
      <year>2022</year>
   </entry>
</document>

Also the font file seems unrelated to the problem so you can replace them with the files available on your system.

Thank you very much!

For me: here is a layout that crashes

<Layout xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:sd="urn:speedata:2009/publisher/functions/en">

    <Record element="data">
        <PlaceObject>
            <Textblock>
                <Paragraph>
                    <Value>&#8203; text</Value>
                </Paragraph>
            </Textblock>
        </PlaceObject>
    </Record>
</Layout>

(4.15.19, sp --dummy)

I cannot reproduce the program crash with this example on 4.15.19, however I found the issue to be using sp --dummy and following layout file, the program exits without indication of error, but the output pdf only contains the "text" after ZWSP, the "my a" before are missing.

<Layout xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:sd="urn:speedata:2009/publisher/functions/en">
  <Options mainlanguage="en"/>
  
  <Record element="data">
        <PlaceObject>
            <Textblock>
                <Paragraph>
                    <Value>my a&#8203;text</Value>
                </Paragraph>
            </Textblock>
        </PlaceObject>
    </Record>
</Layout>

Also I would like to confirm an unexpected behavior, when the file directory is like

layout.xml
data.xml
foo/data.xml

The file foo/data.xml is loaded instead of data.xml. Which I believe is an edge case not handled in the code.

do you use harfbuzz mode when loading the fonts (a global switch in the configuration file or mode="harfbuzz" with <LoadFont...>?

No, it is a different issue from the problem here but It seems harfbuzz mode cannot be use together with font fallback.

... thank you very much, I can reproduce the problem and I will provide a fix.

Also I would like to confirm an unexpected behavior

...

It would help me organzing stuff if this is opened in a different issue. That said, this is "expected", although not well documented: https://doc.speedata.de/publisher/en/basics/fileorganization/#ch-fileorganization

dupicate entries should give a better warning.

Minimal layout:

<Layout
  xmlns="urn:speedata.de:2009/publisher/en"
  xmlns:sd="urn:speedata:2009/publisher/functions/en">

  <Record element="data">
    <PlaceObject>
      <Textblock>
        <Paragraph>
          <Value select="title" />
        </Paragraph>
      </Textblock>
    </PlaceObject>
  </Record>
</Layout>

data:

<data>
    <title>a&#8203; b</title>
</data>

Yes, I can confirm the minimal layout reproduces the same problem I have.

A workaround (until I provide a fix) is to say html="off" with Paragraph:

    <Paragraph html="off">

This should be fixed in version 4.15.20 (now online). Thank you very much for your bug report and your patience!