nteract / papermill

πŸ“š Parameterize, execute, and analyze notebooks

Home Page:http://papermill.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nbformat/nbformat_minor not well extracted with HTTP handler

LetMeR00t opened this issue Β· comments


πŸ› Bug

I'm currently trying to create a connector between Jupyter (using papermill) and another product named "Cortex" from the Strangee project.
I encountered an issue during my development. I'm currently testing the HTTP handler by trying to execute a notebook located on a JupyterHub instance which has a "demo" user for who a "cortex_job" server is configured.

import papermill as pm

    parameters = dict(var1 = "toto")

Everything is working fine to recover the notebook but I get an error message:

ValidationError                           Traceback (most recent call last)
Cell In[1], line 3
      1 import papermill as pm
----> 3 pm.execute_notebook(
      4     "",
      5     "",
      6     parameters = dict(var1 = "toto")
      7 )

File /usr/local/lib/python3.10/dist-packages/papermill/execute.py:89, in execute_notebook(input_path, output_path, parameters, engine_name, request_save_on_cell_execute, prepare_only, kernel_name, language, progress_bar, log_output, stdout_file, stderr_file, start_timeout, report_mode, cwd, **engine_kwargs)
     86 if cwd is not None:
     87     logger.info("Working directory: {}".format(get_pretty_path(cwd)))
---> 89 nb = load_notebook_node(input_path)
     91 # Parameterize the Notebook.
     92 if parameters:

File /usr/local/lib/python3.10/dist-packages/papermill/iorw.py:512, in load_notebook_node(notebook_path)
    502 def load_notebook_node(notebook_path):
    503     """Returns a notebook object with papermill metadata loaded from the specified path.
    505     Args:
    511     """
--> 512     nb = nbformat.reads(papermill_io.read(notebook_path), as_version=4)
    513     nb_upgraded = nbformat.v4.upgrade(nb)
    514     if nb_upgraded is not None:

File /usr/local/lib/python3.10/dist-packages/nbformat/__init__.py:91, in reads(s, as_version, capture_validation_error, **kwargs)
     89 nb = reader.reads(s, **kwargs)
     90 if as_version is not NO_CONVERT:
---> 91     nb = convert(nb, as_version)
     92 try:
     93     validate(nb)

File /usr/local/lib/python3.10/dist-packages/nbformat/converter.py:62, in convert(nb, to_version)
     60 except AttributeError as e:
     61     msg = f"Notebook could not be converted from version {version} to version {step_version} because it's missing a key: {e}"
---> 62     raise ValidationError(msg) from None
     64 # Recursively convert until target version is reached.
     65 return convert(converted, to_version)

ValidationError: Notebook could not be converted from version 1 to version 2 because it's missing a key: cells

When looking into the code, we can see the HTTP handler way of working, which is getting the all response content:


Which gives:

            "source":"# My title\n\n## My subtitle\n\nHello world!"
            "source":"var1 = 3\nvar2 = 5"
                  "text":"var1 is 3, var2 is 5\n"
            "source":"print(\"var1 is {0}, var2 is {1}\".format(var1,var2))"
            "display_name":"Python 3 (ipykernel)",

As you can notice, the nbformat variable is set to 4 but papermill found out that it was 1 (default value).

This assumption is coming from here (under the library nbformat which is reading the notebook):


As you can see, the version is taken from the root node "nbformat" instead of "content.nbformat" which is causing the issue.

Do you know if this a bug on your side or on the nbformat library maybe ? I tested it with a LocalHandler and it's working fine as the output is:

 "cells": [
   "cell_type": "markdown",
   "id": "e0882b67",
   "metadata": {},
   "source": [
    "# My title\n",
    "## My subtitle\n",
    "Hello world!"
   "cell_type": "code",
   "execution_count": 1,
   "id": "e92789a6",
   "metadata": {
    "tags": [
   "outputs": [],
   "source": [
    "var1 = 3\n",
    "var2 = 5"
   "cell_type": "code",
   "execution_count": 2,
   "id": "d49d5a2b",
   "metadata": {},
   "outputs": [
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "var1 is 3, var2 is 5\n"
   "source": [
    "print(\"var1 is {0}, var2 is {1}\".format(var1,var2))"
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
 "nbformat": 4,
 "nbformat_minor": 5

A solution could be to load the JSON answer and get the "content" node before returning the result in the HTTP handler

Thank you


Fix working on my side:


class HttpHandler(object):
    def read(cls, path):
        return json.dumps(requests.get(path, headers={'Accept': 'application/json'}).json()["content"])

    def listdir(cls, path):
        raise PapermillException('listdir is not supported by HttpHandler')

    def write(cls, buf, path):
        payload = {"type": "notebook", "format": "json", "path": path}
        payload["content"] = json.loads(buf)
        result = requests.put(path, json=payload)

    def pretty_path(cls, path):
        return path