Special LaTex characters causing errors in renderization

Question

Special LaTex characters causing errors in renderization

iramosgutierrez opened this issue 8 months ago · comments

Characters such as "&" or "#" causing errors when rendering with LaTex (e.g. in collection numbers or species authors).
Can be corrected by replacing them wih "\&" or "\#".
(edited)

Francisco Rodriguez-Sanchez · Answer 1 · Fri Nov 24 2023 16:47:00 GMT+0800 (China Standard Time)

What do you mean by replacing with "&" or "#"? I think we'd need to use \&, no? What do you think is best solution to implement, as easy as possible for the user? Maybe adding \ to all those characters automatically before rendering? e.g. using gsub()

Reproducible example:

library(labeleR)

data("label.table")
label.table[1,1] <- paste(label.table[1,1], "&", label.table[1,1], "#")

create_collection_label(
  data = label.table,
  path = "labeleR_output",
  qr = "QR_code",
  field1.column = "field1",
  field2.column = "field2",
  field3.column = "field3",
  field4.column = "field6",
  field5.column = "field7"
)
#> The specified folder does not exist. Creating folder
#> No file name provided
#> processing file: collection_label.Rmd
#> output file: collection_label.knit.md
#> /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/pandoc +RTS -K512m -RTS collection_label.knit.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output /tmp/RtmpwdcglO/reprex-bce6751bb030-cushy-kitty/labeleR_output/Collection_label.tex --lua-filter /home/frs/R/x86_64-pc-linux-gnu-library/4.0/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /home/frs/R/x86_64-pc-linux-gnu-library/4.0/rmarkdown/rmarkdown/lua/latex-div.lua --embed-resources --standalone --highlight-style tango --pdf-engine pdflatex --variable graphics --include-in-header /tmp/Rtmp5mTV0r/rmarkdown-strc1115d5268d4.html
#> ! Misplaced alignment tab character &.
#> <argument>  Bombus terrestris subsp. glumbumble &
#>                                                   Bombus terrestris subsp. g...
#> l.85 ...& Bombus terrestris subsp. glumbumble # }}
#>                                                    \\
#> Error: LaTeX failed to compile /tmp/RtmpwdcglO/reprex-bce6751bb030-cushy-kitty/labeleR_output/Collection_label.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See Collection_label.log for more info.

^{Created on 2023-11-24 with reprex v2.0.2}

iramosgutierrez · Answer 2 · Fri Nov 24 2023 17:06:53 GMT+0800 (China Standard Time)

Yes, exactly!
The problem appears when a column has an "&" or "#" (maybe some more I have not detected), which are I think special characters for LaTex.
Yes, I think we should use gsub() for all of them which are rendered as text (not QRs)
Note: in the oppening issue message I added the \ symbol, but somehow it disappeared!

Francisco Rodriguez-Sanchez · Answer 3 · Fri Nov 24 2023 17:17:29 GMT+0800 (China Standard Time)

All right. Then I'd add a function in zzz.R to substitute (gsub) all & or # by \& or \#. Add any other symbol that might be used. And probably we need to run that function on every text column in the data frame passed to each function?

iramosgutierrez · Answer 4 · Fri Nov 24 2023 17:20:08 GMT+0800 (China Standard Time)

I was thinking on that, or maybe adding a gsub within check_column_or_create_empty_char?
maybe:

out <- gsub("&", "\&", out)
out <- gsub("#", "\#", out)

in line 24 of zzz.R? Would that work?

Francisco Rodriguez-Sanchez · Answer 5 · Fri Nov 24 2023 17:39:05 GMT+0800 (China Standard Time)

Sounds good!! Go ahead

Francisco Rodriguez-Sanchez · Answer 6 · Fri Nov 24 2023 20:00:05 GMT+0800 (China Standard Time)

Thanks for the quick pull request! Could we have check_latex called within check_column_or_create_empty_char as originally planned? I think that would make much cleaner code without so much repetition

For example:

check_column_or_create_empty_char <- function(df = NULL, column = NULL, check.latex = TRUE) {

  if (!is.null(column)) {
    check_column_in_df(df, column)
    if (isTRUE(check.latex) {
      out <- check_latex(df, column)
   }
    out <- column
  } else {
    out <- ""
  }

  out

}

Then you could skip check_latex for some column (e.g. QR) by setting the check.latex argument to FALSE

Francisco Rodriguez-Sanchez · Answer 7 · Fri Nov 24 2023 20:01:20 GMT+0800 (China Standard Time)

Also, is this new test testing something different from the one above? Sorry I can't catch the difference

https://github.com/EcologyR/labeleR/pull/50/files#diff-3d64e572e068822afca54a6e942301060eca80c9413103b8e8adfb8ca4822b4b

iramosgutierrez · Answer 8 · Fri Nov 24 2023 20:18:19 GMT+0800 (China Standard Time)

Thanks for the quick pull request! Could we have check_latex called within check_column_or_create_empty_char as originally planned? I think that would make much cleaner code without so much repetition

For example:
check_column_or_create_empty_char <- function(df = NULL, column = NULL, check.latex = TRUE) {

  if (!is.null(column)) {
    check_column_in_df(df, column)
    if (isTRUE(check.latex) {
      out <- check_latex(df, column)
   }
    out <- column
  } else {
    out <- ""
  }

  out

}
Then you could skip check_latex for some column (e.g. QR) by setting the check.latex argument to FALSE

That was my first idea, but the problem waas that the out object was storing the column name rather than the content itself, so I had to write the ugly repetitive code.

iramosgutierrez · Answer 9 · Fri Nov 24 2023 20:20:45 GMT+0800 (China Standard Time)

Also, is this new test testing something different from the one above? Sorry I can't catch the difference

https://github.com/EcologyR/labeleR/pull/50/files#diff-3d64e572e068822afca54a6e942301060eca80c9413103b8e8adfb8ca4822b4b

The difference is tha the 2nd one is using data2, created here:

data2 <- data
data2$Collector <- c("Person1&Person2")

Francisco Rodriguez-Sanchez · Answer 10 · Fri Nov 24 2023 20:40:28 GMT+0800 (China Standard Time)

That was my first idea, but the problem waas that the out object was storing the column name rather than the content itself, so I had to write the ugly repetitive code.

Ah, I think you need to add drop = FALSE at the end of check_latex, i.e.

return(df[, column, drop = FALSE)

That way you always obtain a one-column dataframe, which is what we need.

I would make this simplification if possible, otherwise we're still fixing the problem but growing a codebase that is harder to maintain (google "technical debt")

Francisco Rodriguez-Sanchez · Answer 11 · Fri Nov 24 2023 20:42:12 GMT+0800 (China Standard Time)

For the tests, I mean the first two tests here seem to be testing the same thing? Or I can't see the difference.
In that case we should leave just one

iramosgutierrez · Answer 12 · Fri Nov 24 2023 22:05:36 GMT+0800 (China Standard Time)

Does this make sense?
2050023

I have also erased the simple test, so we are doing both checks in one test

Francisco Rodriguez-Sanchez · Answer 13 · Fri Nov 24 2023 23:57:07 GMT+0800 (China Standard Time)

Thanks @iramosgutierrez

Since we are always checking all or most columns with check_column_or_create_empty_char, why not integrate check_latex within that function? That way we don't have to add any extra lines.

For example, here:

Could we add check_latex within check_column_or_create_empty_char function so we don't have to add more code (arguments, check_latex_columns, etc)?

Sorry I must be missing something...

P.S. Also a more suitable name than check_latex could be "escape_special_characters", which is what this function basically does, no?

Anyway, don't feel pressed to change this now. At least it works! But IMO technical debt is something to keep in mind always. Investing in coherent, streamlined changes avoids painful refactoring or maintenance in the future...

iramosgutierrez · Answer 14 · Sat Nov 25 2023 00:51:03 GMT+0800 (China Standard Time)

My concern here is check_column_or_create_empty_char is not returning information from the data frame, just checking the name of the column or returning an empty character for NULL columns, and that is not this issue's problem, right? Now we are trying to change the input values wheneve they can create conflicts.

Francisco Rodriguez-Sanchez · Answer 15 · Sat Nov 25 2023 01:37:37 GMT+0800 (China Standard Time)

Oh yes, sorry. I missed that. I'm too tired already. Let me have a look tomorrow with a fresh mind? Anyway, the main branch is fixed, so no rush 👍

iramosgutierrez · Answer 16 · Wed Jan 10 2024 20:13:53 GMT+0800 (China Standard Time)

closed with commit # 5f5f8e0