Suggestion: change "Data Frame Manipulation with dplyr" example to not rely on auto-conversion of base R dataframe to atomic vector
brshallo opened this issue · comments
The opening example of Data Frame Manipulation with dplyr relies on base R's dataframe behavior to simplify a one column dataframe to an atomic vector:
gapminder <- read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-
mean(gapminder[gapminder$continent == "Africa", "gdpPercap"])
Given that this is one of the implicit behaviors of the datafarme and that this behavior does not happen for tibbles, (below will return a Warning and NA):
library(dplyr)
gapminder <- read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder_data.csv")
gapminder_tidy <- as_tibble(gapminder)
mean(gapminder_tidy[gapminder_tidy$continent == "Africa", "gdpPercap"])
I think it's better to make a small change to this example so that it doesn't rely on this implicit behavior of base R dataframe's (particularly given that it won't translate to tidyverse dataframes which are just being introduced). Alternative code for the example could be:
mean(gapminder_tidy$gdpPercap[gapminder_tidy$continent == "Africa"])
p.s. I bumped into this issue when going through this example but changing gapminder data to a tibble so that it would print nicer when going through examples but then bumped into issue with example not working.
This seems like a reasonable change to me.
We'd be happy to review a PR where you switched
mean(gapminder[gapminder$continent == "Africa", "gdpPercap"])
to
mean(gapminder$gdpPercap[gapminder$continent == "Africa"])
the latter of which if I understand correctly should work with both base R data frames and tibbles.
Thank you for catching this @brshallo and for your PR (now merged).