Special LaTex characters causing errors in renderization
iramosgutierrez opened this issue · comments
Characters such as "&" or "#" causing errors when rendering with LaTex (e.g. in collection numbers or species authors).
Can be corrected by replacing them wih "\&"
or "\#"
.
(edited)
What do you mean by replacing with "&" or "#"? I think we'd need to use \&
, no? What do you think is best solution to implement, as easy as possible for the user? Maybe adding \
to all those characters automatically before rendering? e.g. using gsub()
Reproducible example:
library(labeleR)
data("label.table")
label.table[1,1] <- paste(label.table[1,1], "&", label.table[1,1], "#")
create_collection_label(
data = label.table,
path = "labeleR_output",
qr = "QR_code",
field1.column = "field1",
field2.column = "field2",
field3.column = "field3",
field4.column = "field6",
field5.column = "field7"
)
#> The specified folder does not exist. Creating folder
#> No file name provided
#> processing file: collection_label.Rmd
#> output file: collection_label.knit.md
#> /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/pandoc +RTS -K512m -RTS collection_label.knit.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output /tmp/RtmpwdcglO/reprex-bce6751bb030-cushy-kitty/labeleR_output/Collection_label.tex --lua-filter /home/frs/R/x86_64-pc-linux-gnu-library/4.0/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /home/frs/R/x86_64-pc-linux-gnu-library/4.0/rmarkdown/rmarkdown/lua/latex-div.lua --embed-resources --standalone --highlight-style tango --pdf-engine pdflatex --variable graphics --include-in-header /tmp/Rtmp5mTV0r/rmarkdown-strc1115d5268d4.html
#> ! Misplaced alignment tab character &.
#> <argument> Bombus terrestris subsp. glumbumble &
#> Bombus terrestris subsp. g...
#> l.85 ...& Bombus terrestris subsp. glumbumble # }}
#> \\
#> Error: LaTeX failed to compile /tmp/RtmpwdcglO/reprex-bce6751bb030-cushy-kitty/labeleR_output/Collection_label.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See Collection_label.log for more info.
Created on 2023-11-24 with reprex v2.0.2
Yes, exactly!
The problem appears when a column has an "&" or "#" (maybe some more I have not detected), which are I think special characters for LaTex.
Yes, I think we should use gsub() for all of them which are rendered as text (not QRs)
Note: in the oppening issue message I added the \
symbol, but somehow it disappeared!
All right. Then I'd add a function in zzz.R to substitute (gsub) all & or # by \&
or \#
. Add any other symbol that might be used. And probably we need to run that function on every text column in the data frame passed to each function?
I was thinking on that, or maybe adding a gsub within check_column_or_create_empty_char
?
maybe:
out <- gsub("&", "\&", out)
out <- gsub("#", "\#", out)
in line 24 of zzz.R? Would that work?
Sounds good!! Go ahead
Thanks for the quick pull request! Could we have check_latex called within check_column_or_create_empty_char
as originally planned? I think that would make much cleaner code without so much repetition
For example:
check_column_or_create_empty_char <- function(df = NULL, column = NULL, check.latex = TRUE) {
if (!is.null(column)) {
check_column_in_df(df, column)
if (isTRUE(check.latex) {
out <- check_latex(df, column)
}
out <- column
} else {
out <- ""
}
out
}
Then you could skip check_latex for some column (e.g. QR) by setting the check.latex argument to FALSE
Also, is this new test testing something different from the one above? Sorry I can't catch the difference
Thanks for the quick pull request! Could we have check_latex called within
check_column_or_create_empty_char
as originally planned? I think that would make much cleaner code without so much repetitionFor example:
check_column_or_create_empty_char <- function(df = NULL, column = NULL, check.latex = TRUE) { if (!is.null(column)) { check_column_in_df(df, column) if (isTRUE(check.latex) { out <- check_latex(df, column) } out <- column } else { out <- "" } out }Then you could skip check_latex for some column (e.g. QR) by setting the check.latex argument to FALSE
That was my first idea, but the problem waas that the out object was storing the column name rather than the content itself, so I had to write the ugly repetitive code.
Also, is this new test testing something different from the one above? Sorry I can't catch the difference
The difference is tha the 2nd one is using data2, created here:
data2 <- data
data2$Collector <- c("Person1&Person2")
That was my first idea, but the problem waas that the out object was storing the column name rather than the content itself, so I had to write the ugly repetitive code.
Ah, I think you need to add drop = FALSE at the end of check_latex, i.e.
return(df[, column, drop = FALSE)
That way you always obtain a one-column dataframe, which is what we need.
I would make this simplification if possible, otherwise we're still fixing the problem but growing a codebase that is harder to maintain (google "technical debt")
For the tests, I mean the first two tests here seem to be testing the same thing? Or I can't see the difference.
In that case we should leave just one
Does this make sense?
2050023
I have also erased the simple test, so we are doing both checks in one test
Thanks @iramosgutierrez
Since we are always checking all or most columns with check_column_or_create_empty_char
, why not integrate check_latex within that function? That way we don't have to add any extra lines.
For example, here:
Could we add check_latex within check_column_or_create_empty_char
function so we don't have to add more code (arguments, check_latex_columns, etc)?
Sorry I must be missing something...
P.S. Also a more suitable name than check_latex could be "escape_special_characters", which is what this function basically does, no?
Anyway, don't feel pressed to change this now. At least it works! But IMO technical debt is something to keep in mind always. Investing in coherent, streamlined changes avoids painful refactoring or maintenance in the future...
My concern here is check_column_or_create_empty_char
is not returning information from the data frame, just checking the name of the column or returning an empty character for NULL columns, and that is not this issue's problem, right? Now we are trying to change the input values wheneve they can create conflicts.
Oh yes, sorry. I missed that. I'm too tired already. Let me have a look tomorrow with a fresh mind? Anyway, the main branch is fixed, so no rush 👍
closed with commit # 5f5f8e0