in2code-de / publications

Follower of EXT:bib to show publications in TYPO3. Im- and export of bibtext and xml files.

Home Page:https://www.in2code.de/agentur/typo3-extensions/publications/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] export of bibtex has wrong encoding of umlaut and other special characters

karliwalti opened this issue · comments

issue:

special characters are not correctly encoden when a bibtex file is generated for export

example

current bib export:


@Inbook { Garatva2023,
    author = {Garatva, Patricia and Terhorst, Yannik and Me{\{\dq}s}ner, Eva-Maria and Karlen, Walter and Pryss, R{{\dq}u}diger and Baumeister, Harald},
    title = {Smart Sensors for Health Research and Improvement},
    year = {2023},
    DOI = {10.1007/978-3-030-98546-2_23},
    booktitle = {Digital Phenotyping and Mobile Sensing.},
    publisher = {Springer International Publishing},
    address = {Cham},
    series = {Studies in Neuroscience, Psychology and Behavioral Economics},
    editor = {Montag, Christian and Baumeister, Harald},
    pages = {395--411},
    file_url = {https://doi.org/10.1007/978-3-030-98546-2_23}
}

desired format:


@Inbook { Garatva2023,
    author = {Garatva, Patricia and Terhorst, Yannik and Me{\"s}ner, Eva-Maria and Karlen, Walter and Pryss, R{\"u}diger and Baumeister, Harald},
    title = {Smart Sensors for Health Research and Improvement},
    year = {2023},
    DOI = {10.1007/978-3-030-98546-2_23},
    booktitle = {Digital Phenotyping and Mobile Sensing.},
    publisher = {Springer International Publishing},
    address = {Cham},
    series = {Studies in Neuroscience, Psychology and Behavioral Economics},
    editor = {Montag, Christian and Baumeister, Harald},
    pages = {395--411},
    file_url = {https://doi.org/10.1007/978-3-030-98546-2_23}
}

current xml export:

<reference>
<bibtype>inbook</bibtype>
<citeid>Garatva2023</citeid>
<title>Smart Sensors for Health Research and Improvement</title>
<year>2023</year>
<isbn>978-3-030-98546-2</isbn>
<DOI>10.1007/978-3-030-98546-2_23</DOI>
<booktitle>Digital Phenotyping and Mobile Sensing.</booktitle>
<publisher>Springer International Publishing</publisher>
<address>Cham</address>
<series>Studies in Neuroscience, Psychology and Behavioral Economics</series>
<editor>Montag, Christian and Baumeister, Harald</editor>
<pages>395--411</pages>
<file_url>https://doi.org/10.1007/978-3-030-98546-2_23</file_url>
<authors>
<person>
<fn>Patricia</fn>
<sn>Garatva</sn>
</person>
<person>
<fn>Yannik</fn>
<sn>Terhorst</sn>
</person>
<person>
<fn>Eva-Maria</fn>
<sn>Meßner</sn>
</person>
<person>
<fn>Walter</fn>
<sn>Karlen</sn>
</person>
<person>
<fn>R{&quot;u}diger</fn>
<sn>Pryss</sn>
</person>
<person>
<fn>Harald</fn>
<sn>Baumeister</sn>
</person>
</authors>
</reference>

desired behavior:

export leads to a proper encoding such as it can be imported again with importer (reversible) or used with other programs

solution:

I see two possible reasons and approaches:

  1. The data is encoded in the wrong order or too often . looks like the " is encoded as \dq after it was already decoded to {"u}. In this case chars after \ (or within curly brackets) should not be encoded another time.
  2. The original string already contains the encoded character and it should not be encoded at all

Based on above xml export, it could be actually the combination of both, since the ü seemed to be encoded in db.
In fact, this would be also a bug in the xml export where the encoding should be removed before export.