innovationOUtside / nb_workflow_tools

Repository collating tools to support the processing of Jupyter notebooks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ou-tm351 - nb_workflow_tools

First attempt at some command line utils to support Jupyter notebook workflows for OU course TM351.

To install:

pip3 install git+https://github.com/innovationOUtside/nb_workflow_tools

To upgrade a current installation to the latest repo version without updating dependencies:

pip3 install --upgrade --no-deps git+https://github.com/innovationOUtside/nb_workflow_tools

For other utility toolbelts, see for example:

Tools

A variety of tools are bundled as CLI commands published via the package or informally sketched in various Jupyter notebooks in the notebooks directory.

Zipper

Tools for previewing the files contained in a zip file and creating new zip files.

Zip file contents preview

Usage: tm351zipview [OPTIONS] [FILENAME]...

  List the contents of one or more specified zipfiles.

Options:
  --warnings / -w   Display warnings
  --help            Show this message and exit.

The tm351zipview reports four columns: file_size, file compressed size, datetime and filename. If you select -w various advisory notices will be displayed about the zip file contents (eg overlong filenames, large files, hidden files).

The warnings report takes the following form:

====== Zip file quality report: /Users/tonyhirst/Documents/GitHub/tm351-undercertainty/notebooks/tm351/test1.zip ======

ERROR: the filepath element "11.A SQL Data Investigation Worked Example (optional).ipynb" in "Part 11 Notebooks/11.A SQL Data Investigation Worked Example (optional).ipynb" is too long (max. 50 chars)
WARNING: "Part 11 Notebooks/.DS_Store" is a hidden file/directory (do you really need it in the zip file?)
ERROR: the filepath element "11.A SQL Data Investigation Worked Example (optional).ipynb" in "Part 11 Notebooks/11.A SQL Data Investigation Worked Example (optional).ipynb" is too long (max. 50 chars)
WARNING: "Part 11 Notebooks/.delme" is a hidden file/directory (do you really need it in the zip file?)
WARNING: "Part 11 Notebooks/.ipynb_checkpoints/" is a hidden file/directory (do you really need it in the zip file?)
WARNING: "Part 11 Notebooks/.ipynb_checkpoints/11.2 subqueries as value and set-checkpoint.ipynb" is a hidden file/directory (do you really need it in the zip file?)
WARNING: "Part 11 Notebooks/sql_movie_data/people.csv": looks quite large file (20.7 MB unzipped, 7.8 MB compressed)
WARNING: "Part 11 Notebooks/sql_movie_data/cast_members.csv": looks quite large file (9.5 MB unzipped, 3.5 MB compressed)
WARNING: "Part 11 Notebooks/sql_movie_data/people-clean-dates.csv": looks quite large file (20.9 MB unzipped, 7.8 MB compressed)
WARNING: "Part 11 Notebooks/sql_movie_data/movies.csv": looks quite large file (5.3 MB unzipped, 2.3 MB compressed)
WARNING: "Part 11 Notebooks/sql_movie_data/crew.csv": looks quite large file (10.2 MB unzipped, 2.3 MB compressed)

===========================

Zip file creator

Usage: tm351zip [OPTIONS] PATH ZIPFILE

  Create a zip file from the contents of a specified directory.

  The zipper can optionally run a notebook processor on notebooks before
  zipping them to check that all cells are run or all cells are cleared.

Options:
  -r, --file-processor [clearOutput|runWithErrors]
  -H, --include-hiddenfiles       Include hidden files
  -X, --exclude-dir PATH          Exclude specified directory
  -x, --exclude-file PATH         Exclude specified file
  -a, --zip_append                Add to existing zip file
  --help                          Show this message and exit.

Grab files from Github repo

Note - there is a gotcha trying to connect to Github - using Python on Mac: https://stackoverflow.com/a/42098127/454773

Usage: tm351gitrepos [OPTIONS]

  Download files from a specfied branch in a particular git repository.
  
  This command can download files from public Github repositories without authentication, although the API is heavily rate limited if you do no authenticate.

  The download can also be limited to just the contents of a specified directory.
  
  Don't worry that there look to be a lot of arguments - you will be 
  prompted for them if you just run: `tm351gitrepos --auth`

Options:
  --github-user TEXT              Your Github username.
  --password TEXT
  --repo TEXT                     Repository name
  --branch TEXT                   Branch or tag to download
  --directory TEXT                Directory to download (or: all)
  --savedir PATH                  Directory to download repo / repo dir into;
                                  default is dir name
  --file-processor [clearOutput|runWithErrors]
                                  Optionally specify a file processor to be
                                  run against downloaded notebooks.
  --zip / --no-zip                Optionally create a zip file of the
                                  downloaded repository/directory with the
                                  same name as the repository/directory.
  --help                          Show this message and exit.

Testing notebooks

Notebooks are tested using the nbval package. Notebooks should have pre-run cells you want to test against. Running tm351nbtest will rerun the notebooks in the environment you run the command in and compares the cell outputs to the previously run cell outputs. (So if you're testing a Docker containerised environment, install this packahge and run the test from the command line inside the container.)

Running tm351nbtest will print out a list of cells where the cell outputs from a new run of the notebook mismatch the original output. Note that you can “escape” cells that generate known errors by adding a cell tag raises-exception. You can also force cells to be ignored by tagging them with the nbval-ignore-output tag.

Usage: tm351nbtest [OPTIONS] [TESTITEMS]...

  Test specified notebooks and/or the notebooks in a specified directory 
  or directories (`TESTITEMS`) using the `nbdime` plugin for `py.test`.
  
  Running `tm351nbtest` without
  any specified directory or file will assemble tests recursively from the
  current directory down.

Options:
  -X, --exclude-dir PATH  Do not recurse through specified directory when
                          assembling tests.
  -o, --outfile PATH      Output report file. Leave this blank to display
                          report on command line.
  --help                  Show this message and exit.

Running notebooks and cleaning output cells

Usage: tm351nbrun [OPTIONS] PATH

  Directory processor for notebooks - allows the user to run nbconvert
  operations on notebooks, such as running all cells or clearing all cells.

  To run tests, use: tm351nbtest To zip folders (with the option or running
  notebook processors on zipped files), use: tm351zip

Options:
  -r, --file-processor [clearOutput|runWithErrors]
                                  File processor actions that can be applied 
                                  to notebooks using `nbconvert`
  --outpath PATH                  path to output directory
  --inplace / --no-inplace        Run processors on notebooks inplace
  -X, --exclude-dir PATH          Exclude specified directory
  -x, --exclude-file PATH         Exclude specified file
  --include-hidden / --no-include-hidden
                                  Include hidden files
  --rmdir / --no-rmdir            Check the output directory is empty before
                                  we use it
  --currdir / --no-currdir        Process files in current directory
  --subdirs / --no-subdirs        Process files in subdirectories
  --reportlevel INTEGER           Reporting level
  --auth / --no-auth              By default, run with auth (prompt for
                                  credentials)
  -t, --with-tests                Run tests on notebooks after download
  --help                          Show this message and exit.

Empinken updater

Update tag styles used for empinken cells:

# Recurse on directory path rewriting .ipynb files with new tag style
upgrade_empinken_tags ./

Notebook metadata updater - classicnb2jl extension metadata

Patches metadata for extension migration:

  • collapsible headings ( heading_collapsed -> jp-MarkdownHeadingCollapsed) ; (also run nb_cell_metadata_strip Part\ 07\ Notebooks hidden to tidy other metadata sometimes used w/ collapsed cells in OU motebooks)
Usage: cnb_collapse_head_migrate [OPTIONS] PATH

  Fix collapsible headings metadata.

Options:
  --recursive / --no-recursive  Recursive search of directories.
  --hidden / --no-hidden        Include hidden files and
                                directories.
  --help                        Show this message and exit.

Notebook Split and Merge Utilities

Simple tools to merge notebooks and split notebooks on a particular separator.

We can merge two or more notebooks with a command of the form: nb_merge FILENAME1 FILENAME2 ...

Usage: nb_merge [OPTIONS] [FILES]...

  Merge two or more notebooks. Note that this may overwrite a pre-existing
  file.

Options:
  -o, --outfile PATH
  --help              Show this message and exit.

We can split a notebook at one or mote split points with a command of the form: nb_split FILENAME

By default, the split point is #--SPLITHERE-- or # --SPLITHERE--. It should appear as the only item in either a markdown cell or a code cell.

Usage: nb_split [OPTIONS] PATH

  Split a notebook at a splitpoint marker. Note that this may overwrite pe-
  existing files.

Options:
  --splitter TEXT               String to split on (default: #--SPLITHERE--)
  --overwrite / --no-overwrite  Overwrite pre-existing files
  --help                        Show this message and exit.

Ensure Activity Answer Cells Are Collapsed

Ensure that notebooks in a directory path have activity answers collapsed using the Collapsible Headings classic Jupyter notebook extension. This currently relies on heuristics to detect the answer header cell, or the presence of a precollapse tag in a markdown cell that starts with a heading. Specifically, it should be highlighted as a blue activity cell using the nb_extension_empinken activity tool (which adds the style-activity tag to a cell) and contains at least some of the following text:

possibles = ["# Our solution", "# Answer", "click on the triangle symbol"]

Usage takes the form: nb_collapse_activities PATH and is recursive by default.

Usage: nb_collapse_activities [OPTIONS] PATH

  Collapse activity answers.

Options:
  --recursive / --no-recursive  Recursive search of directories.
  --cnb / --no-cnb              Use classic notebook extension
                                metadata value (default: use no-cnb (JupyterLab/nb7) format).
  --help                        Show this message and exit.

Ensure Tags Toolbar is Collapsed

Ensure that notebooks have the tags toolbar view collapsed:

Usage: nb_collapse_tagstoolbar [OPTIONS] PATH

  Collapse tags toolbar.

Options:
  --recursive / --no-recursive  Recursive search of directories.
  --help                        Show this message and exit.

Clean Cell Metadata Tag

Remove metadata object from cell metadata by key, eg nbdime-conflicts or scrolled

Usage: nb_cell_metadata_strip [OPTIONS] PATH KEY

  Clean metadata from cell.

Options:
  --recursive / --no-recursive  Recursive search of directories.
  --help                        Show this message and exit.

Autotag Figure Cells

Autotag figure output code cells in pre-run notebooks (default tag: nbval-figure)

Usage: nb_cell_figure_tagger [OPTIONS] PATH

  Autotag figure output cell.

Options:
  -t, --tag TEXT                Tag to label figure output cells.
  --recursive / --no-recursive  Recursive search of directories.
  --help                        Show this message and exit.

About

Repository collating tools to support the processing of Jupyter notebooks

License:MIT License


Languages

Language:Python 55.3%Language:Jupyter Notebook 44.7%