Background for rep_sample_n() function
rudeboybert opened this issue · comments
Note: Put this back in if people have trouble understanding rep_sample_n() at first:
Let's show an example of this function in action. Let's first use the tibble()
function to manually create a data frame of five fruit called fruit_basket
.
fruit_basket <- tibble(
fruit = c("Mango", "Tangerine", "Apricot", "Pamplemousse", "Lime")
)
We'll then %>%
pipe the fruit_basket
data frame into the rep_sample_n()
function and set size = 3
, indicating that we want to sample three fruit:
fruit_basket %>%
rep_sample_n(size = 3)
Your results will likely be different, since we are taking a random sample of size 3. Now let's see what happens when we try to sample six fruit:
fruit_basket %>%
rep_sample_n(size = 6)
Error in sample.int(n, size, replace = replace, prob = prob) :
cannot take a sample larger than the population when 'replace = FALSE'
We get an error message telling us that we cannot take a sample that has more rows than the original data frame. This is because rep_sample_n()
by defaults samples without replacement\index{sampling without replacement}. Once it samples a fruit from the basket, it does not put it back in.