swcarpentry / r-novice-gapminder

R for Reproducible Scientific Analysis

Home Page:http://swcarpentry.github.io/r-novice-gapminder/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

11-writing-data: bash inclusion

mjcasy opened this issue · comments

Re: https://github.com/swcarpentry/r-novice-gapminder/blob/main/_episodes_rmd/11-writing-data.Rmd

Possible issue on line 106, "Let's switch back to the shell to take a look at the data to make sure it looks". I don't see where the shell has been used in this lesson to be switched back to; moreover, we should probably stick only to R in this lesson?

Thank you for opening this issue @mjcasy, this is a very good point!

I searched for "shell" in the R markdown files of all episodes for this lesson, and several occurrences are found (see below). It looks like we refer to "the shell lesson" several times in this R lesson, so there might be a need to address this across the entire R lesson.

  • In 01-rstudio-intro.Rmd: a parallel is made between the R interpreter and the shell environment. The lesson actually mentions "the shell environment you learned about during the shell lessons", so this might be something from a time when this lesson was envisioned as being part of a larger, integrated Software Carpentry curriculum including the shell lesson. There is also a parallel between R's ls() and the shell's ls in a Tip box later in the episode.

  • In 02-project-intro.Rmd: challenge 4 in this episode uses the shell to examine a csv file. The head command is used to see the beginning of the csv file in the terminal.

  • in 07-control-flow.Rmd: again, a parallel is made between R and the shell (for loops and grep command).

  • in 11-writing-data.Rmd: the shell's head command is used to verify how the data in a file looks like (this is the usage you've spotted).

  • in 13-dplyr.Rmd: again, a parallel between R and the shell is made, this time about the pipe.

So overall, it looks like:

  • Several episodes were written with the assumption that people followed the shell lesson before this R lesson. I agree with you that it would be better to stick only with R as much as possible in this lesson and to assume that it is self-contained.

  • Most references to the shell were used to make parallel between R and the shell. While this can be useful if people have been exposed to the shell before, it can make the lesson flow rougher for people who are not familiar with it.

  • I think that explaining how to use the shell's head command to check csv file is a good advice, but it would be nice to have something more general and independent from the shell itself.

Here are some suggestions about what could be done:

  • Remove all the wording about "the shell lesson" which implies that people have followed the shell lesson before.

  • Remove the parallels made between R and the shell from the main text, and instead put them into concise Tip boxes like "If you know about the shell..." or mention in the instructors notes that there are several places where the shell can be mentioned if their students already know about it beforehand.

  • Replace checking csv files with the head command from the shell by something more general, like e.g. using a text editor of the student's choice (and maybe add a Tip box "if you know the shell, you can use head which will work even for huge csv files").

Any thoughts on this?

I like the suggestions!

  • I agree that all wording about "the shell lesson" should be removed, I hadn't thought to search for other instances! Thinking back to when I took this course as a student a few years ago, it was done in conjunction with shell lessons, so maybe that was the suggested norm?

  • I think adding in the shell instances as tips may create a feeling for the student that there is something else they need to know, which could be a bit stressful for beginners to programming - I think probably best to put it in the instructors notes?

  • Though I think it would be good to include along with other suggestions for checking csv files, such as a text editor or spreadsheet software.

Seem like a good way forward?

I'm teaching right now a workshop only on R, and I noticed how confusing for participants all the references to shell are! My suggestions are:

  1. remove all the references to shell and change the corresponding code to R code,
    OR
  2. introduce the terminal in RStudio and explain a couple of useful shell commands that can be executed.

Thanks for your feedback and suggestions @mjcasy and @LucaDiStasio! It is always very helpful to hear feedback from recent workshops so that we can keep improving the material and the learning experience.

It looks like we agree that a good way forward is to remove all references to the shell from the lesson itself; and also that the shell could be mentioned in the instructors notes instead and that we can write about checking csv files with a text editor or spreadsheet software instead of using head.

Would anyone like to submit a PR to adress this?

I think the content reflects that in the original form of SWC workshops, there was a requirement to (at the very least) cover bash, git, and a programming language (R or python); in the even older SWC days, SQL was also a requirement. If these requirements were not met, then the training could not be called a SWC workshop (people were of course always free to use the materials however they like, they just couldn't call it a SWC workshop/training).

I don't know what the current rules are for this, but I think it's tricky to decide what to do here. On one hand if a learner has not been exposed to the shell at all, then the references are confusing and don't work. On the other hand, if this is taught as part of a workshop where the shell has been introduced already (as the were originally designed to do), then the call back to the earlier material can help to solidify the content for learners and demonstrate to them the parallels between what might otherwise seem like very disparate ways of interaction with the computer. So, I am not sure what the right thing to do is here.