First attempt at some command line utils to support Jupyter notebook workflows for OU course TM351.
To install:
pip3 install git+https://github.com/innovationOUtside/nb_workflow_tools
To upgrade a current installation to the latest repo version without updating dependencies:
pip3 install --upgrade --no-deps git+https://github.com/innovationOUtside/nb_workflow_tools
For other utility toolbelts, see for example:
A variety of tools are bundled as CLI commands published via the package or informally sketched in various Jupyter notebooks in the notebooks
directory.
Tools for previewing the files contained in a zip file and creating new zip files.
Usage: tm351zipview [OPTIONS] [FILENAME]...
List the contents of one or more specified zipfiles.
Options:
--warnings / -w Display warnings
--help Show this message and exit.
The tm351zipview
reports four columns: file_size, file compressed size, datetime and filename. If you select -w
various advisory notices will be displayed about the zip file contents (eg overlong filenames, large files, hidden files).
The warnings report takes the following form:
====== Zip file quality report: /Users/tonyhirst/Documents/GitHub/tm351-undercertainty/notebooks/tm351/test1.zip ======
ERROR: the filepath element "11.A SQL Data Investigation Worked Example (optional).ipynb" in "Part 11 Notebooks/11.A SQL Data Investigation Worked Example (optional).ipynb" is too long (max. 50 chars)
WARNING: "Part 11 Notebooks/.DS_Store" is a hidden file/directory (do you really need it in the zip file?)
ERROR: the filepath element "11.A SQL Data Investigation Worked Example (optional).ipynb" in "Part 11 Notebooks/11.A SQL Data Investigation Worked Example (optional).ipynb" is too long (max. 50 chars)
WARNING: "Part 11 Notebooks/.delme" is a hidden file/directory (do you really need it in the zip file?)
WARNING: "Part 11 Notebooks/.ipynb_checkpoints/" is a hidden file/directory (do you really need it in the zip file?)
WARNING: "Part 11 Notebooks/.ipynb_checkpoints/11.2 subqueries as value and set-checkpoint.ipynb" is a hidden file/directory (do you really need it in the zip file?)
WARNING: "Part 11 Notebooks/sql_movie_data/people.csv": looks quite large file (20.7 MB unzipped, 7.8 MB compressed)
WARNING: "Part 11 Notebooks/sql_movie_data/cast_members.csv": looks quite large file (9.5 MB unzipped, 3.5 MB compressed)
WARNING: "Part 11 Notebooks/sql_movie_data/people-clean-dates.csv": looks quite large file (20.9 MB unzipped, 7.8 MB compressed)
WARNING: "Part 11 Notebooks/sql_movie_data/movies.csv": looks quite large file (5.3 MB unzipped, 2.3 MB compressed)
WARNING: "Part 11 Notebooks/sql_movie_data/crew.csv": looks quite large file (10.2 MB unzipped, 2.3 MB compressed)
===========================
Usage: tm351zip [OPTIONS] PATH ZIPFILE
Create a zip file from the contents of a specified directory.
The zipper can optionally run a notebook processor on notebooks before
zipping them to check that all cells are run or all cells are cleared.
Options:
-r, --file-processor [clearOutput|runWithErrors]
-H, --include-hiddenfiles Include hidden files
-X, --exclude-dir PATH Exclude specified directory
-x, --exclude-file PATH Exclude specified file
-a, --zip_append Add to existing zip file
--help Show this message and exit.
Note - there is a gotcha trying to connect to Github - using Python on Mac: https://stackoverflow.com/a/42098127/454773
Usage: tm351gitrepos [OPTIONS]
Download files from a specfied branch in a particular git repository.
This command can download files from public Github repositories without authentication, although the API is heavily rate limited if you do no authenticate.
The download can also be limited to just the contents of a specified directory.
Don't worry that there look to be a lot of arguments - you will be
prompted for them if you just run: `tm351gitrepos --auth`
Options:
--github-user TEXT Your Github username.
--password TEXT
--repo TEXT Repository name
--branch TEXT Branch or tag to download
--directory TEXT Directory to download (or: all)
--savedir PATH Directory to download repo / repo dir into;
default is dir name
--file-processor [clearOutput|runWithErrors]
Optionally specify a file processor to be
run against downloaded notebooks.
--zip / --no-zip Optionally create a zip file of the
downloaded repository/directory with the
same name as the repository/directory.
--help Show this message and exit.
Notebooks are tested using the nbval
package. Notebooks should have pre-run cells you want to test against. Running tm351nbtest
will rerun the notebooks in the environment you run the command in and compares the cell outputs to the previously run cell outputs. (So if you're testing a Docker containerised environment, install this packahge and run the test from the command line inside the container.)
Running tm351nbtest
will print out a list of cells where the cell outputs from a new run of the notebook mismatch the original output. Note that you can “escape” cells that generate known errors by adding a cell tag raises-exception
. You can also force cells to be ignored by tagging them with the nbval-ignore-output
tag.
Usage: tm351nbtest [OPTIONS] [TESTITEMS]...
Test specified notebooks and/or the notebooks in a specified directory
or directories (`TESTITEMS`) using the `nbdime` plugin for `py.test`.
Running `tm351nbtest` without
any specified directory or file will assemble tests recursively from the
current directory down.
Options:
-X, --exclude-dir PATH Do not recurse through specified directory when
assembling tests.
-o, --outfile PATH Output report file. Leave this blank to display
report on command line.
--help Show this message and exit.
Usage: tm351nbrun [OPTIONS] PATH
Directory processor for notebooks - allows the user to run nbconvert
operations on notebooks, such as running all cells or clearing all cells.
To run tests, use: tm351nbtest To zip folders (with the option or running
notebook processors on zipped files), use: tm351zip
Options:
-r, --file-processor [clearOutput|runWithErrors]
File processor actions that can be applied
to notebooks using `nbconvert`
--outpath PATH path to output directory
--inplace / --no-inplace Run processors on notebooks inplace
-X, --exclude-dir PATH Exclude specified directory
-x, --exclude-file PATH Exclude specified file
--include-hidden / --no-include-hidden
Include hidden files
--rmdir / --no-rmdir Check the output directory is empty before
we use it
--currdir / --no-currdir Process files in current directory
--subdirs / --no-subdirs Process files in subdirectories
--reportlevel INTEGER Reporting level
--auth / --no-auth By default, run with auth (prompt for
credentials)
-t, --with-tests Run tests on notebooks after download
--help Show this message and exit.
Update tag styles used for empinken cells:
# Recurse on directory path rewriting .ipynb files with new tag style
upgrade_empinken_tags ./
Patches metadata for extension migration:
- collapsible headings (
heading_collapsed -> jp-MarkdownHeadingCollapsed
) ; (also runnb_cell_metadata_strip Part\ 07\ Notebooks hidden
to tidy other metadata sometimes used w/ collapsed cells in OU motebooks)
Usage: cnb_collapse_head_migrate [OPTIONS] PATH
Fix collapsible headings metadata.
Options:
--recursive / --no-recursive Recursive search of directories.
--hidden / --no-hidden Include hidden files and
directories.
--help Show this message and exit.
Simple tools to merge notebooks and split notebooks on a particular separator.
We can merge two or more notebooks with a command of the form: nb_merge FILENAME1 FILENAME2 ...
Usage: nb_merge [OPTIONS] [FILES]...
Merge two or more notebooks. Note that this may overwrite a pre-existing
file.
Options:
-o, --outfile PATH
--help Show this message and exit.
We can split a notebook at one or mote split points with a command of the form: nb_split FILENAME
By default, the split point is #--SPLITHERE--
or # --SPLITHERE--
. It should appear as the only item in either a markdown cell or a code cell.
Usage: nb_split [OPTIONS] PATH
Split a notebook at a splitpoint marker. Note that this may overwrite pe-
existing files.
Options:
--splitter TEXT String to split on (default: #--SPLITHERE--)
--overwrite / --no-overwrite Overwrite pre-existing files
--help Show this message and exit.
Ensure that notebooks in a directory path have activity answers collapsed using the Collapsible Headings classic Jupyter notebook extension. This currently relies on heuristics to detect the answer header cell, or the presence of a precollapse
tag in a markdown cell that starts with a heading. Specifically, it should be highlighted as a blue activity cell using the nb_extension_empinken
activity tool (which adds the style-activity
tag to a cell) and contains at least some of the following text:
possibles = ["# Our solution", "# Answer", "click on the triangle symbol"]
Usage takes the form: nb_collapse_activities PATH
and is recursive by default.
Usage: nb_collapse_activities [OPTIONS] PATH
Collapse activity answers.
Options:
--recursive / --no-recursive Recursive search of directories.
--cnb / --no-cnb Use classic notebook extension
metadata value (default: use no-cnb (JupyterLab/nb7) format).
--help Show this message and exit.
Ensure that notebooks have the tags toolbar view collapsed:
Usage: nb_collapse_tagstoolbar [OPTIONS] PATH
Collapse tags toolbar.
Options:
--recursive / --no-recursive Recursive search of directories.
--help Show this message and exit.
Remove metadata object from cell metadata by key, eg nbdime-conflicts
or scrolled
Usage: nb_cell_metadata_strip [OPTIONS] PATH KEY
Clean metadata from cell.
Options:
--recursive / --no-recursive Recursive search of directories.
--help Show this message and exit.
Autotag figure output code cells in pre-run notebooks (default tag: nbval-figure
)
Usage: nb_cell_figure_tagger [OPTIONS] PATH
Autotag figure output cell.
Options:
-t, --tag TEXT Tag to label figure output cells.
--recursive / --no-recursive Recursive search of directories.
--help Show this message and exit.