jonjoncardoso / lse-ds105-week09-experiment

Week 09 Workshop

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DS105L - Week 09 live experiments

This repository is part of LSE DS105L 2022/23, for a lecture entitled "πŸ”€ Merge operations & πŸ“¦ practical tips for code organisation".

The major focus will be on how to work effectively as a group using GitHub, based on the feedback I received from Shuyu and general interactions with students over Slack/Office Hours.

I could have taken a passive approach and just demonstrate things to you, but I would rather transform this into a workshop where you can learn while practicing.

Here is how it's going to work:

Part ONE

Part ONE

βš™οΈ Setup

  1. I will create a repository from the jonjoncardoso/data-science-workflow template and I will edit the README.md to remove the template-related text.

  2. I will add a Jupyter Notebook with some web scraping code that is not greatly optimised to use pandas as we have been learning in this course...

πŸ“‹ Create an issue

  1. I will create a GitHub Issue with a feature request to optimise the code.
  2. Anyone in the audience will be welcome to comment on this GitHub issue with suggestions for code optimisation.
  3. Once we found a solution that we're happy about, we will be ready to close the issue. But I won't close it straightaway!
🌴 Branching

Instead of modifying it directly in my notebook, I will demonstrate how groups can work in parallel on GitHub.

  1. I will open a separate branch, dedicated to that issue, and then I will make my changes there and git push
  2. Then, I will open a Pull Request and ask some of you to validate my changes.
  3. Once we got approval from you, I will git merge changes to main
  4. We will look at the git tree
  5. I will tell you about a common practice of using a develop vs a main branch.

This whole process is a more professional set of practices for using Git and it is commonly known as the Gitflow workflow.

Part TWO

Part TWO

Now I will move my relevant code to a python script and I will invoke it from the Jupyter notebook. I will explain why and when it is good to do so. Then, I will open a new issue with an exercise on data pre-processing. Everyone will now try to work out a solution for the exercise using Gitflow!

  1. Branch from develop and give it a meaningful name.
  2. Push your branch to GitHub.
  3. Now work on your changes, commit and push them as you like.
  4. Once ready, open a pull request from your branch to develop and tag me (@jonjoncardoso) as a reviewer.
  5. I will review a few and add feedback notes on the spot.
  6. Hopefully, some of the solutions will be merged!
Part THREE (time allowing)

Part THREE

🧰 Dev Setup

🧰 Dev Setup

  1. Install Python 3.8 or higher on your computer.

  2. Install anaconda or miniconda (lighter) on your computer.

  3. Create a new conda environment:

    conda create -y -n=venv-ds105 python=3.10.8
  4. Activate the environment and make sure you have pip installed inside that environment:

# the exact `activate` command will vary depending on your OS
conda activate venv-ds105

πŸ’‘ Remember to activate this particular conda environment whenever you reopen VSCode/the terminal.

  1. Install required libraries
pip install -r requirements.txt

Now, whenever you open a Jupyter Notebook, you should see the venv-ds105 kernel available.

πŸ‘₯ Contributors

About

Week 09 Workshop

License:MIT License


Languages

Language:Jupyter Notebook 67.5%Language:R 30.6%Language:Python 1.9%