EcologyR / labeleR

Package to create your own labels, certificates, and much more! :)

Home Page:https://ecologyr.github.io/labeleR/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Special LaTex characters causing errors in renderization

iramosgutierrez opened this issue · comments

Characters such as "&" or "#" causing errors when rendering with LaTex (e.g. in collection numbers or species authors).
Can be corrected by replacing them wih "\&" or "\#".
(edited)

What do you mean by replacing with "&" or "#"? I think we'd need to use \&, no? What do you think is best solution to implement, as easy as possible for the user? Maybe adding \ to all those characters automatically before rendering? e.g. using gsub()

Reproducible example:

library(labeleR)

data("label.table")
label.table[1,1] <- paste(label.table[1,1], "&", label.table[1,1], "#")

create_collection_label(
  data = label.table,
  path = "labeleR_output",
  qr = "QR_code",
  field1.column = "field1",
  field2.column = "field2",
  field3.column = "field3",
  field4.column = "field6",
  field5.column = "field7"
)
#> The specified folder does not exist. Creating folder
#> No file name provided
#> processing file: collection_label.Rmd
#> output file: collection_label.knit.md
#> /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/pandoc +RTS -K512m -RTS collection_label.knit.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output /tmp/RtmpwdcglO/reprex-bce6751bb030-cushy-kitty/labeleR_output/Collection_label.tex --lua-filter /home/frs/R/x86_64-pc-linux-gnu-library/4.0/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /home/frs/R/x86_64-pc-linux-gnu-library/4.0/rmarkdown/rmarkdown/lua/latex-div.lua --embed-resources --standalone --highlight-style tango --pdf-engine pdflatex --variable graphics --include-in-header /tmp/Rtmp5mTV0r/rmarkdown-strc1115d5268d4.html
#> ! Misplaced alignment tab character &.
#> <argument>  Bombus terrestris subsp. glumbumble &
#>                                                   Bombus terrestris subsp. g...
#> l.85 ...& Bombus terrestris subsp. glumbumble # }}
#>                                                    \\
#> Error: LaTeX failed to compile /tmp/RtmpwdcglO/reprex-bce6751bb030-cushy-kitty/labeleR_output/Collection_label.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See Collection_label.log for more info.

Created on 2023-11-24 with reprex v2.0.2

Yes, exactly!
The problem appears when a column has an "&" or "#" (maybe some more I have not detected), which are I think special characters for LaTex.
Yes, I think we should use gsub() for all of them which are rendered as text (not QRs)
Note: in the oppening issue message I added the \ symbol, but somehow it disappeared!

All right. Then I'd add a function in zzz.R to substitute (gsub) all & or # by \& or \#. Add any other symbol that might be used. And probably we need to run that function on every text column in the data frame passed to each function?

I was thinking on that, or maybe adding a gsub within check_column_or_create_empty_char?
maybe:

out <- gsub("&", "\&", out)
out <- gsub("#", "\#", out)

in line 24 of zzz.R? Would that work?

Sounds good!! Go ahead

Thanks for the quick pull request! Could we have check_latex called within check_column_or_create_empty_char as originally planned? I think that would make much cleaner code without so much repetition

For example:

check_column_or_create_empty_char <- function(df = NULL, column = NULL, check.latex = TRUE) {

  if (!is.null(column)) {
    check_column_in_df(df, column)
    if (isTRUE(check.latex) {
      out <- check_latex(df, column)
   }
    out <- column
  } else {
    out <- ""
  }

  out

}

Then you could skip check_latex for some column (e.g. QR) by setting the check.latex argument to FALSE

Also, is this new test testing something different from the one above? Sorry I can't catch the difference

https://github.com/EcologyR/labeleR/pull/50/files#diff-3d64e572e068822afca54a6e942301060eca80c9413103b8e8adfb8ca4822b4b

Thanks for the quick pull request! Could we have check_latex called within check_column_or_create_empty_char as originally planned? I think that would make much cleaner code without so much repetition

For example:

check_column_or_create_empty_char <- function(df = NULL, column = NULL, check.latex = TRUE) {

  if (!is.null(column)) {
    check_column_in_df(df, column)
    if (isTRUE(check.latex) {
      out <- check_latex(df, column)
   }
    out <- column
  } else {
    out <- ""
  }

  out

}

Then you could skip check_latex for some column (e.g. QR) by setting the check.latex argument to FALSE

That was my first idea, but the problem waas that the out object was storing the column name rather than the content itself, so I had to write the ugly repetitive code.

Also, is this new test testing something different from the one above? Sorry I can't catch the difference

https://github.com/EcologyR/labeleR/pull/50/files#diff-3d64e572e068822afca54a6e942301060eca80c9413103b8e8adfb8ca4822b4b

The difference is tha the 2nd one is using data2, created here:

data2 <- data
data2$Collector <- c("Person1&Person2")

That was my first idea, but the problem waas that the out object was storing the column name rather than the content itself, so I had to write the ugly repetitive code.

Ah, I think you need to add drop = FALSE at the end of check_latex, i.e.

return(df[, column, drop = FALSE)

That way you always obtain a one-column dataframe, which is what we need.

I would make this simplification if possible, otherwise we're still fixing the problem but growing a codebase that is harder to maintain (google "technical debt")

For the tests, I mean the first two tests here seem to be testing the same thing? Or I can't see the difference.
In that case we should leave just one

Does this make sense?
2050023

I have also erased the simple test, so we are doing both checks in one test

Thanks @iramosgutierrez

Since we are always checking all or most columns with check_column_or_create_empty_char, why not integrate check_latex within that function? That way we don't have to add any extra lines.

For example, here:

image

Could we add check_latex within check_column_or_create_empty_char function so we don't have to add more code (arguments, check_latex_columns, etc)?

Sorry I must be missing something...

P.S. Also a more suitable name than check_latex could be "escape_special_characters", which is what this function basically does, no?

Anyway, don't feel pressed to change this now. At least it works! But IMO technical debt is something to keep in mind always. Investing in coherent, streamlined changes avoids painful refactoring or maintenance in the future...

My concern here is check_column_or_create_empty_char is not returning information from the data frame, just checking the name of the column or returning an empty character for NULL columns, and that is not this issue's problem, right? Now we are trying to change the input values wheneve they can create conflicts.

Oh yes, sorry. I missed that. I'm too tired already. Let me have a look tomorrow with a fresh mind? Anyway, the main branch is fixed, so no rush 👍

closed with commit # 5f5f8e0