moderndive / ModernDive_book

Statistical Inference via Data Science: A ModernDive into R and the Tidyverse

Home Page:https://www.moderndive.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Background for rep_sample_n() function

rudeboybert opened this issue · comments

Note: Put this back in if people have trouble understanding rep_sample_n() at first:

Let's show an example of this function in action. Let's first use the tibble() function to manually create a data frame of five fruit called fruit_basket.

fruit_basket <- tibble(
  fruit = c("Mango", "Tangerine", "Apricot", "Pamplemousse", "Lime")
)

We'll then %>% pipe the fruit_basket data frame into the rep_sample_n() function and set size = 3, indicating that we want to sample three fruit:

fruit_basket %>% 
  rep_sample_n(size = 3)

Your results will likely be different, since we are taking a random sample of size 3. Now let's see what happens when we try to sample six fruit:

fruit_basket %>% 
  rep_sample_n(size = 6)
Error in sample.int(n, size, replace = replace, prob = prob) : 
  cannot take a sample larger than the population when 'replace = FALSE'

We get an error message telling us that we cannot take a sample that has more rows than the original data frame. This is because rep_sample_n() by defaults samples without replacement\index{sampling without replacement}. Once it samples a fruit from the basket, it does not put it back in.