swcarpentry / python-novice-gapminder

Plotting and Programming in Python

Home Page:http://swcarpentry.github.io/python-novice-gapminder/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reorganisation of lesson

pelahi opened this issue · comments

The current lesson plan seems out of order to me. Based on my experiences teaching programming and python, the first topics should cover more basic concepts and end with using functions and libraries to plot results. I have opened a pull request #547 showing the reorganisation of the lesson.

That is a fair point. While I tend to agree that presenting basic syntax and then introducing external libraries and functions seems like a more 'natural' order, I believe that the advantage of the current order of episodes is to quickly expose learners to what can be done with Python (reading tabular data, summarising the data, and plotting). In my experience, this makes learners more interested in the lesson and improves their experience in the later episodes (lists, conditionals, functions). Curious to know what other people think.

Perhaps I am being a bit harsh but if in practice there is not enough time to cover what would qualify as "basics", then it should be left out and at the outset, the material should make clear that users should have some understanding of some basic Python. I realise there is a lot of material in most carpentries lessons and they are aspirational rather than some stick to. However, having more basic concepts at the end, which in practice is not covered means it should be removed and the lesson restructured so that this knowledge is a prerequisite.

Even with the philosophy of "introduce useful stuff early", for loops, conditionals and lists are VERY useful stuff if you have never been exposed to them. So the organisation does not follow that philosophy anyway. Again, unless you assume most users have some knowledge of programming structures and python, in which case all basic material should be removed.

Again, I do not want to sound too harsh but a reduction in the number of episodes and a change from large number of episodes to a smaller number of more specific ones would be better. Or it needs to be clear to students that the amount of material is aspirational. I would side on removing unnecessary material. Perhaps the lesson should be reworked to be just Analysis and Plotting with Python.

Another possibility is to have a non-hands on intro material that highlights the goals of the lesson and then moving on to the more conventional lesson plan but reduce the amount of material in these introductory episodes, perhaps grouping them together. More of a refresher.

What do you think @vahtras , @vinisalazar ?

Thanks for raising this issue! I think it is an important one to consider and reconsider as time goes on. There are fair points to be made on both sides, and one way isn't always the best way. My background is in traditional computer science and so I learned things similarly to how you've proposed (foundations first, e.g., programming language semantics / structure and interpretation of computer programs) but don't think it's the only way to learn things.

Instead of the maintainers I think we need to hear more voices from the community: instructors who have taught the material and learners who have been in a workshop with the material. Garnering their majority support would present a stronger case than a single "in my experience..." story.

Really interesting discussion. @pelahi I definitely see where you are coming from and I largely agree with you, especially that are just too many lessons. Many topics that are covered in Programming with Python are covered in 5-10 minutes less here.

I do feel that the current organization has a couple advantages that are being overlooked. My impression is that this Carpentries lessons is targeted at people who 1) have no experience programming and b) are probably currently using Excel. For those students, jumping right in to the foundations of programming will get less buy-in than the current structure.

Most non-programmers doing data analysis use spreadsheets to manipulate and plot data. By introducing pandas and plotting early on, it becomes immediately clear to students how programmatic methods can replace their current way of doing things. It puts the lesson material on a firm foundation and references things that the students already know. Then when the lesson material introduces loops and functions, students will be able to see how they are immediately useful. I worry that introducing loops and functions before introducing tabular data structures would leave out some important context that helps achieve student buy-in. While functions and loops are absolutely 'useful stuff', they aren't useful if you don't have a way to access your data.

I think that what you are suggesting is a more efficient way to learn programming, but given the audience and given the time frame, I lean towards the current organization.

Recently a colleague taught this lesson by showing all the basic stuff (variables, lists, for loops, conditionals, functions) first, without any reference to data, leaving pandas and all data-related stuff to the afternoon/second day.

I observed the lesson and must say I disagreed with the choice. It was clear to me that the students found everything quite abstract and understood the concepts but could not relate to them, which then impairs their ability to connect them to real use-cases and to each other. So I strongly disagree with this kind of reorganisation.

Interesting @martinosorb , is that the feedback you got from the students? I will say that I was never arguing for a complete focus on basic stuff to the exclusion of any grounding. I was arguing for an overview of basic concepts that are immediately then grounded by an example, with the material clearly indicating that some of the programming structures (for loops, conditionals), covered later.

For example:

# in python we can store data and later refer to it using variables
# here use a python structure called a List and have a set of integers in this list
x = [1, 2, 3, 4]
# next we want to process this data, say by calculating the average. We can do this via for loops (to be covered later)
sum = 0 
for val in x:
  sum += val # add value to sum
ave = sum/len(x) #  lets get the average of our set of integers

Here the goal is to unpack the basic programming structures, showing how some very rudimentary data processing can be done. Then take portions and show how libraries can be import and provide the same functionality as all this code

# lets import the numerical python library which provides a lot of functions we can use in our analysis 
import numpy 
x = np.array([1, 2, 3, 4]) # lets make an array of our set of integers. Allows for easy manipulation  
ave = np.ave(x) # and get numpy to calculate an average for us (which it does by following the same logic as we had before) 

What was the feedback from the students?

Dear @pelahi I do not have feedback from the students, as they do not know what the alternative would have been -- but that was my feeling.

All - I suggest we either post this discussion to the community and ask for further opinions, or close it.

I saw a fantastic talk by Mine Çetinkaya-Rundel at CarpentryCon in Manchester about backwards design and "showing the cake" first. Mine's talk does a far better job than I would of describing why I feel that "seeing the cake first" (or being able to contextualise learning at the start) is far more relatable than seeing the ingredients first with no idea of how they're supposed to be used or treated.

We originally taught in computer science order ("these are your data types", "these are loops", etc.) from the bottom up, but both experience and learning theory led us to the "cake first" approach that @froggleston describes. We have to convince learners by the first coffee break that what we're saying is going to help them solve they problems that they think they have; a bottom-up CS-style approach doesn't do that. (And even as a long-time Pythonista, I will admit that the tidyverse is better for this: you can load/filter/group/summarize/plot in your first hour with tidyverse, but need to know a lot more about the quirks of computer language syntax to do the same with pandas.)

Having taught both SC Python lessons a dozen times or so, I strongly prefer the current structure, specifically because it's not taught following the computer science/programming model (at least in order) but it does cover those things. Moving loops and conditionals early in the lesson can serve to alienate and intimidate people with limited computing background and increase cognitive load early on. By focusing on objects and working from simple to complex objects then moving to more ways to interact with them (and by focusing mostly on a limited collection of types), people can build confidence gradually. Also, for people who want to use Python primarily for data analysis in Pandas, loops/conditionals/functions may not ever be necessary at all as there are frequently other way to accomplish those goals natively in Pandas.

Mostly though, it really does come down to reducing the cognitive load and the feelings of inadequacy early on, as well as (like others said) focusing on things that produce outcomes people recognize as like what they're looking for quickly (easy wins) rather than having to wait hours to see why we're doing something.

I have taught both Carpentries' workshops and university courses (for biologists) both ways over the last 15 years. In both cases I started by teaching fundamentals first, but like others have shifted to the "cake first" approach based on both experience and reading up on research about the most effective teaching methods. I now take an exclusively "cake first" approach and have found it much more effective for most folks for who are domain specific researcher (i.e., not computer scientists), at least in the biology and environmental sciences.

The motivation for cake first is well laid out here (and the figure is excellent) https://carpentries.github.io/instructor-training/08-motivation.html#how-can-content-influence-motivation

Thank you all for your feedback! I think this is enough consensus to close the issue for now. If the Carpentries enables Discussions on these repos we could convert to a discussion if others still want to chime in later...

Thanks for raising the issue @pelahi - these conversations help us all figure out and establish community values as we meander towards good practices together