swcarpentry / r-novice-gapminder

R for Reproducible Scientific Analysis

Home Page:http://swcarpentry.github.io/r-novice-gapminder/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Suggestion: change "Data Frame Manipulation with dplyr" example to not rely on auto-conversion of base R dataframe to atomic vector

brshallo opened this issue · comments

The opening example of Data Frame Manipulation with dplyr relies on base R's dataframe behavior to simplify a one column dataframe to an atomic vector:

gapminder <- read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-

mean(gapminder[gapminder$continent == "Africa", "gdpPercap"])

Given that this is one of the implicit behaviors of the datafarme and that this behavior does not happen for tibbles, (below will return a Warning and NA):

library(dplyr)

gapminder <- read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder_data.csv")
gapminder_tidy <- as_tibble(gapminder)

mean(gapminder_tidy[gapminder_tidy$continent == "Africa", "gdpPercap"])

I think it's better to make a small change to this example so that it doesn't rely on this implicit behavior of base R dataframe's (particularly given that it won't translate to tidyverse dataframes which are just being introduced). Alternative code for the example could be:

mean(gapminder_tidy$gdpPercap[gapminder_tidy$continent == "Africa"])

p.s. I bumped into this issue when going through this example but changing gapminder data to a tibble so that it would print nicer when going through examples but then bumped into issue with example not working.

This seems like a reasonable change to me.

We'd be happy to review a PR where you switched
mean(gapminder[gapminder$continent == "Africa", "gdpPercap"]) to
mean(gapminder$gdpPercap[gapminder$continent == "Africa"])

the latter of which if I understand correctly should work with both base R data frames and tibbles.

Thank you for catching this @brshallo and for your PR (now merged).