Ellis Brown
10/09/2023
see tutorial.ipynb!
Good tooling:
- Accelerates experimentation
- make more progress, faster
- more robust code
- Enables reproducibility
- Facilitates collaboration
Why I like it:
- Lightweight and fast
- Extensions / Plugins -> scales to power-use
- Jupyter
- Copilot
- Git
- Remote SSH
- Debugging
- Wide adoption / community / support
- interactive notebook + code execution
- great for experimentation, data exploration, and visualization
- lightweight way to test/develop your library code, interactively
- all of the features of the VSCode editor, native to the notebook experience π₯
- Copilot!
- compute clusters / HPC / Slurm:
- can start a Jupyter server (e.g. in
tmux
session)
& connect to it
- can start a Jupyter server (e.g. in
- autoreload: reloads imported modules before executing code
- my default 1st cell in ~ every notebook:
%load_ext autoreload %autoreload 2
- my default 1st cell in ~ every notebook:
- VSCode Jupyter plots matplotlib inline automatically :)
- organizing your notebooks by date is a great way to keep track of your work / progress
- you will thank yourself later!
- e.g., I nest my notebooks in a
notebooks/
dir with the following naming convention:notebooks/YYYY-MM-DD-<name>.ipynb
- GPT-3 powered Code Autocompletion (on steroids)
- potentially best application of current generations of LLMs?
Oct 2021 beta β changed the way I approach coding
ChatGPT as a lens to understand Copilot
- context = prompt + history (in ChatGPT parlance)
- the better the context, the better the suggestions
Copilot Context: code + comments
- in current file
- especially current/previous line
- "Fill-In-the-Middle" paradigm
- code before & after the cursor
- from neighboring tabs
- Copilot attends significantly more to code if it is open in a tab
- from other files in the repo, (but much less likely to be added to the context)
- other files in the repo are scanned too, but much less likely to be added to the context
https://github.blog/2023-05-17-how-github-copilot-is-getting-better-at-understanding-your-code/
Shortcuts:
tab
to accept suggestionopt+[
,opt+]
to cycle through suggestions
def sqrt(x):
"""
Compute square roots using the method of Heron of Alexandria.
Args:
x: The number for which the square root is to be computed.
"""
if x < 0:
raise ValueError(
"Cannot compute square root of "
f"negative number {x}"
)
guess = x
i = 0
while guess * guess != x and i < 20:
guess = (guess + x / guess) / 2.0
i += 1
return guess
add documentation to the
sqrt
function
def sqrt(x):
"""
Compute square roots using the method of Heron of Alexandria.
Args:
x: The number for which the square root is to be computed.
Returns:
The square root of x.
"""
if x < 0:
raise ValueError(
"Cannot compute square root of "
f"negative number {x}"
)
guess = x
i = 0
while guess * guess != x and i < 20:
guess = (guess + x / guess) / 2.0
i += 1
return guess
# create a test for the sqrt function
def test_sqrt():
assert sqrt(4) == 2
assert sqrt(9) == 3
assert sqrt(16) == 4
# run the test for the sqrt function
test_sqrt()
e.g., look up the args for the autoreload magic function
%load_ext autoreload
%autoreload 2 # Reload all local modules every time before executing the Python code typed.
# what does the 2 parameter mean?
# ans: 2 means always reload all modules (except those excluded by %aimport) before executing the Python code typed.
# list options
# 1. %autoreload 0 -> disable automatic reloading
# 2. %autoreload 1 -> Reload all modules imported with %aimport every time before executing the Python code typed.
# 3. %autoreload 2 -> Reload all modules (except those excluded by %aimport) every time before executing the Python code typed.
e.g., plot the distribution of the
iris
dataset
"""
Load the iris dataset, and plot the first two features in a scatter plot.
"""
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
iris = load_iris()
X = iris.data
y = iris.target
# make the plot smaller
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
# add a title to the plot
plt.title('Iris Dataset')
# save the plot as a PDF
plt.savefig('iris-data.pdf')
e.g., regex
# write a regex to match all words surrounded by parentheses
pattern1 = r"\(.*?\)"
import re
# test code
print(re.findall(pattern1, '(I want to match this)'))
print(re.findall(pattern1, 'I want to (match) this'))
print(re.findall(pattern1, 'I want to (match) this and (this) and (this)'))
# (round 2) add examples where there are open / close parens w/ no matches
print(re.findall(pattern1, 'I want to (match this')) # no match, left paren not closed
print(re.findall(pattern1, 'I want to match this)')) # no match, right paren not opened
print(re.findall(pattern1, 'I want to match this')) # no match, no parens
# convert the above into a real test case
def test_parentheses():
assert re.findall(pattern1, '(I want to match this)') == ['(I want to match this)']
assert re.findall(pattern1, 'I want to (match) this') == ['(match)']
assert re.findall(pattern1, 'I want to (match) this and (this) and (this)') == ['(match)', '(this)', '(this)']
# (round 2 tests)
assert re.findall(pattern1, 'I want to (match this') == []
assert re.findall(pattern1, 'I want to match this)') == []
assert re.findall(pattern1, 'I want to match this') == []
print('Success!')
# run the test
test_parentheses()
['(I want to match this)']
['(match)']
['(match)', '(this)', '(this)']
[]
[]
[]
Success!
note: chatgpt is also great for this!
"""
colors: red green blue yellow orange purple brown black white
animals: dog cat horse pig cow sheep goat chicken
python path/to/your/script.py --color red --animal dog
python path/to/your/script.py --color green --animal cat
python path/to/your/script.py --color blue --animal horse
python path/to/your/script.py --color yellow --animal pig
python path/to/your/script.py --color orange --animal cow
python path/to/your/script.py --color purple --animal sheep
python path/to/your/script.py --color brown --animal goat
"""
'\ncolors: red green blue yellow orange purple brown black white\nanimals: dog cat horse pig cow sheep goat chicken\n\n\npython path/to/your/script.py --color red --animal dog\npython path/to/your/script.py --color green --animal cat\npython path/to/your/script.py --color blue --animal horse\npython path/to/your/script.py --color yellow --animal pig\npython path/to/your/script.py --color orange --animal cow\npython path/to/your/script.py --color purple --animal sheep\npython path/to/your/script.py --color brown --animal goat\n\n'
stolen from here: https://github.blog/2023-06-20-how-to-write-better-prompts-for-github-copilot/#3-best-practices-for-prompt-crafting-with-github-copilot
- top of file
- above a section
"""
Create a basic markdown editor in Next.js with the following features:
- Use react hooks
- Create state for markdown with default text "type markdown here"
- A text area where users can write markdown
- Show a live preview of the markdown text as I type
- Support for basic markdown syntax like headers, bold, italics
- Use React markdown npm package
- The markdown text and resulting HTML should be saved in the component's state and updated in real time
"""
- articulate the logic / steps for it to follow
- often will be able to auto-complete!
- start writing code to get more specific suggestions
Let GitHub Copilot generate the code after each step, rather than asking it to generate a bunch of code all at once.
n.b., look into this cell's comments to see examples of me prompting Copilot to display the above gif :0
- think prompting!
- can provide examples in a comment prompt
- paste in a desired output dict
# flatmap the data and return a list of names
data = [
[
{ 'name': 'John', 'age': 25 },
{ 'name': 'Jane', 'age': 30 }
],
[
{ 'name': 'Bob', 'age': 40 }
]
]
# expected output: 'John', 'Jane', 'Bob'
def get_names(data):
return [person['name'] for group in data for person in group]
# test code
print(get_names(data))
# write a test case
def test_get_first_names():
assert get_names(data) == ['John', 'Jane', 'Bob']
print('Success!')
# run the test
test_get_first_names()
['John', 'Jane', 'Bob']
Success!
- descriptive variable names
- modularize code
- comment your code --> prompts!
- write docstrings (when appropriate)
- Copilot is great for writing documentation!
- but too much = hard to read code
- write tests!
- Copilot makes this easier AND more essential
- blindly accepting every suggestion = bugs!!!
- especially if you don't understand what it's doing --> much harder to find :(
Conda
- creates isolated, Python-version-specific envs
- e.g., some old libs only work with 3.7
- more flexible: manage Python & other dependencies together
- e.g.,
cudatoolkit
,libopencv
,ffmpeg
, ...
- e.g.,
- most portable / reproducible
- want your project setup to be able to work on Linux/Mac/Windows etc. with different hardware
- essential for collaboration
- has
pip
integrated- can install
pip
packages in aconda
env - but not vice versa
- can install
- always use
mamba
in place ofconda
https://mamba.readthedocs.io/en/latest/index.html
- use your environment file
environment.yaml
- add new dependencies to the file as needed
- prefer
conda
overpip
dependencies when possible
- helpful aliases:
# conda activate alias ca="conda activate" # create conda environment from file alias mcf="mamba env create --file" # update conda environment from file alias muf="mamba env update --file"
make life easier for your future self
- prefer committing too often over not often enough
- horror-stories: losing days/weeks of work
- easier to find bugs on the git history
- take the 2 seconds to write descriptive commit messages
- e.g., "updates" --> "fixes bug in
foo.py
"
- e.g., "updates" --> "fixes bug in
- use branches for large changes
- raise a Pull Request on GitHub to review all changes
- can squash small commits to keep it clean
.gitignore
- toptal: top google result for "gitignore generator"
- https://www.toptal.com/developers/gitignore?templates=linux,macos,python,jupyternotebooks,data
- VSCode Git plugin makes it easy!
- can stage/unstage lines of code:
- select lines of code
- right click -> "Stage Selected Ranges"
- right click -> "Unstage Selected Ranges"
- useful, easy to forget git functions
- undo last commit
- commit staged (ammend) --> adds current staged changes to previous commit
- can stage/unstage lines of code: