AlexsLemonade / reproducible-research

Repository for Reproducible Research Practices Training Workshop

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Content: Organizing R/Python scripts

sjspielman opened this issue · comments

Several applicants to the workshop have expressed a specific interest in making their code more readable. Although we are not planning to teach R/Python, we can still discuss how one should generally organize scripts in these languages.

Ideas to show here:

  • Use notebooks (jupyter/rmd) to integrate code with high-level discussion of the code
  • Writing functions instead of repeating code (not saying we want to teach how to write functions - we can provide a resource for this in R and Python)
    • Also introduce the concept of docstrings or at a minimum the need to comment/document your functions
  • File organization: libraries and paths at the top
  • Already part of Issue #10 - how to organize multiple scripts in relation to one another in a project directory, and stitching together with a shell script

Include optparse!

Other thoughts on content here:

  • functions should go at the top (maybe tiny ones in the body). This allow transition to importing/sourcing separate files with functions.

  • In python: do we want to mention if __name__ == '__main__':? This may be too in the weeds, but it is useful.

I'm going to take a crack at this, at least to get started, following the project organization content.

Going to assign you to #25 as well @jashapiro

In python: do we want to mention if name == 'main':? This may be too in the weeds, but it is useful.

As I understood it, the python components are more of a "here's the python version for you to look at later," so as long as there is enough explanation about what/why __name__ == '__main__' does, it seems reasonable to me. It's also really common to come across this (cryptically) in the stackoverflow universe. This is as far as I'd go into python double-underscore land.