hpc-carpentry / hpc-python

HPC Python lesson materials

Home Page:https://hpc-carpentry.github.io/hpc-python/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fork of this lesson in incubator

ocaisa opened this issue · comments

There is a fork of this lesson in the Carpentries Incubator which focusses purely on Snakemake (and that is indeed the majority of the content in this lesson). Does it make sense to maintain this lesson if others are willing to maintain that independently? Should we put some effort into making the lesson in the incubator more transferable instead?

I also think the title of our lesson is a little misleading anyway. What is primarily covered is actually (reproducible and scalable) workflows. The HPC Python lesson from JSC and used in PRACE covers a much greater range of topics:

  • Interactive parallel programming with IPython
  • Profiling and optimization
  • High-performance NumPy
  • Just-in-time compilation with numba
  • Distributed-memory parallel programming with Python and MPI
  • Bindings to other programming languages and HPC libraries
  • Interfaces to GPUs

Just now getting to this (I've been tied up on other items for weeks). I guess it depends on where any HPC-Python lesson fits into other lessons offered.

Snakemake isn't necessarily an HPC tool, so I'm good for it to be a separate set of episodes for whoever needs it.

Similarly, these lessons start from "Python as a calculator" and other topics already covered in the normal Python workshops in Software Carpentry.

So I'm good to consider an "HPC Python" lesson set to focus on HPC/parallel-specific topics, and to have regular SC Python as a prerequisite. This is similar to how Research Software Engineering with Python assumes that their audience has already been using Shell/Git/Python, and their Python lessons start with building command line tools and using if __name__ == "__main__":.

This lesson's original purpose was to teach people workflow orchestration with Snakemake and nothing else. Because HPC Carpentry assumes the same audience as Software Carpentry, it was originally written with the expectation that learners knew nothing at all about Python. Since Snakemake requires some Python knowledge, the Python basics were bundled into this lesson in order for the lesson to be usable for a Software Carpentry/HPC Carpentry audience.

The carpentries-incubator fork assumes that learners already know Python. If we assume that they learned it from the normal Software Carpentry Python courses, this adds roughly half a day to a full day of lesson time depending on how fast the Snakemake stuff gets taught (1 day of normal "learn Python" + 0.5-1 days of Snakemake from the carpentries-incubator lesson).

Bundling in the absolute bare-minimum of Python basics in with Snakemake itself (like is done in this lesson) is a little faster overall in terms of teaching time and fits all of the content in one day. That's really the only difference between the two approaches.


Also the reason no "true HPC" Python content got included is because 1) it was impossible to teach to a Software Carpentry audience and also keep the lesson within the normal 1-day timeframe, and 2) the data science ecosystem already has generated a million courses out there on all of this stuff (so I didn't see much point in duplicating that work).