Security footgun with Rmd generation

Question

Security footgun with Rmd generation

jcheng5 opened this issue 5 years ago · comments

The {{ / }} knit_expand mechanism is a bit too low-level; since it works on a purely textual level, it's easy for values intended to be text to be interpreted as code instead.

e.g. an Rmd template that contains this snippet (not in a code block):

The variable we'll be focusing on in this report is {{col_name}}.

And then buildRmdBundle(..., vars = list(col_name = input$col_name)).

A malicious client could easily send a col_name value of "\n```{r}\nunlink("whatever", recursive=TRUE)\n```\n", which would expand into the Rmd template as

The variable we'll be focusing on in this report is 
```{r}
unlink("whatever", recursive=TRUE)
```
.

Some ideas:

~~We could add the values in vars to the knitr environment, then tell people to do `r col_name` instead of {{col_name}}.~~ (Won't work, see my next comment)
~~We could recommend people pass non-code values through params. That seems like a higher-overhead version of the previous option though.~~ (Won't work, see my next comment)
We could error on suspicious vars or {{/}} substitutions. Yuck, security heuristics.
We could stop using knit_expand altogether, and use a different chunk type to insert code dynamically. This would probably be a considerable amount of knitr hacking, I think I looked at this last time and I couldn't find an obvious way to preprocess a chunk and then have knitr treat it as code.

Carson Sievert · Answer 1 · Mon Aug 19 2019 23:59:11 GMT+0800 (China Standard Time)

We could add the values in vars to the knitr environment, then tell people to do r col_name instead of {{col_name}}.

It seems like this could get complicated quickly (from a user point of view). Considering that we currently recommend the params approach when generating Rmd reports from Shiny, at least my initial feeling is we should go that route and document it better in our vignettes.

Joe Cheng · Answer 2 · Tue Aug 20 2019 09:56:51 GMT+0800 (China Standard Time)

Shoot, neither `r col_name` nor params will work in this case, as it's not reproducible (unless we also provide them with a .R script that invokes rmarkdown::render). Meaning, the report PDF/HTML generated using buildRmdBundle(render = TRUE) will be correct, but they won't get the same results by just knitting the .Rmd, which is the whole point.

🤔

Joe Cheng · Answer 3 · Tue Aug 20 2019 10:09:51 GMT+0800 (China Standard Time)

We could use our own mustache variant that forces you to indicate whether the thing you're rendering is "text" or "code". If text, then we coerce the result to character then escape any character that has special meaning (for Rmd that'd be backtick -> `, I suppose?). If code, then the value would be deparsed.

(Hmmm, we might already have a bug here, if you put a {{placeholder}} in a code block and the value is a string I think it'll go into the Rmd verbatim rather than being deparsed.)

Joe Cheng · Answer 4 · Sat Jan 25 2020 02:50:55 GMT+0800 (China Standard Time)

If we're willing to live with heuristics, we could do something like:

Throws on \n'''{ and 'r :

buildRmdBundle(..., vars = list(col_name = input$col_name))

Doesn't throw:

buildRmdBundle(..., vars = list(col_name = input$col_name), allow_unsafe_values = TRUE)

Throws only for col_name:

buildRmdBundle(..., vars = list(col_name = input$col_name, code_stuff = allow_unsafe(input$foo)), allow_unsafe_values = TRUE)

Joe Cheng · Answer 5 · Sat Jun 06 2020 04:10:02 GMT+0800 (China Standard Time)

Now also exploring an approach where we parse the .Rmd before and after knit_expand, and if the number of chunks has changed, we fail by default.

Update 2021-01-28: https://github.com/rundel/parsermd exists. I don't know if it shows us inline code chunks though, which we would need.

Joe Cheng · Answer 6 · Fri Jan 29 2021 03:57:42 GMT+0800 (China Standard Time)

Advice from Yihui, circa June 2020:

knit_expand() evaluates the expression
{{code}} by knitr::knit_hooks$get('evaluate.inline'). If you have
security concerns, it is probably a better idea to define the
evaluate.inline hook and examine the code before evaluating it, e.g.,
(not tested)
library(knitr)
eval_inline = knit_hooks$get('evaluate.inline')  # original hook
knit_hooks$set(evaluate.inline = function(code, envir) {
  code = xfun::split_lines(code)
  if (any(grepl(all_patterns$md$chunk.begin, code)) stop(
    'You are not allowed to include a code chunk in the variable'
  )
  eval_inline(code, envir)
})

Carson Sievert · Answer 7 · Fri Feb 05 2021 04:04:34 GMT+0800 (China Standard Time)

We'll also need some way to hook into multi-line code chunks (not just inline). More specifically, note how an error isn't raised here:

---
title: "Untitled"
output: html_document
---

```{r setup, include=FALSE}
library(knitr)
eval_inline = knit_hooks$get('evaluate.inline')  # original hook
knit_hooks$set(evaluate.inline = function(code, envir) {
  code = xfun::split_lines(code)
  if (any(grepl(all_patterns$md$chunk.begin, code))) stop(
    'You are not allowed to include a code chunk in the variable'
  )
  eval_inline(code, envir)
})
```

## R Markdown

```{r cars}
summary(cars)
```

Carson Sievert · Answer 8 · Fri Mar 19 2021 03:03:46 GMT+0800 (China Standard Time)

Fixed by #92