NBFormat Does Not Recognize Indentation

Question

NBFormat Does Not Recognize Indentation

z3ht opened this issue 3 years ago · comments

Hello,

Thanks for the awesome tool! I would like to use it for a project I'm working on but have the following issue:

NBFormat does not recognize indentation levels people set for their notebook's JSON file (more info: https://jupyter-notebook.readthedocs.io/en/latest/frontend_config.html#example-changing-the-notebook-s-default-indentation). This is a problem because when a notebook with more than one space for indentation is formatted by NBFormat, all lines are rewritten to use single spaced indentation. As a result, the git diff is unreadable.

I propose the following two possible solutions (either would work):

NBFormat could parse how many spaces are used in a file and default to using that many spaces when writing JSON back to that file
NBFormat could accept an indent option within the nbformat.write function allowing users to specify how many spaces should be used

I suspect the second possible solution would be easier and therefore the better option.

Problem Walkthrough
Original nb JSON (note the double-space indent):

  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "nbgrader": {
          "grade": false,
          "locked": true,
          "schema_version": 3,
          "solution": false
        },
        "id": "peZlh40I73Ql"
      },
      "source": [
        "# Deep Learning\n",
        "\n",
        "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up."
      ]
    },
    ...
  }

After a call to nbformat.write(nb=nb, fp=notebook_path, version=nbformat.NO_CONVERT), the following is the JSON output (note the single-space indent):

 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "peZlh40I73Ql",
    "nbgrader": {
     "grade": false,
     "locked": true,
     "schema_version": 3,
     "solution": false
    }
   },
   "source": [
    "# Deep Learning\n",
    "\n",
    "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up."
   ]
  },
  ...
}

Best,
Andrew

Wes Turner · Answer 1 · Mon Aug 30 2021 21:07:53 GMT+0800 (China Standard Time)

IMO, the fixed indentation reduces line noise in diffs. Avoiding line noise in diffs justifies setting the indentation level to a fixed indent. If the/a JSON parser was extended to support sniffing the (first few?) indentation levels, I still don't think nbformat should support configurable indentation levels. If you need to reindent a JSON document to e.g. manually review the ipynb, `python -m json.tool --indent 2 nb.ipynb` may be helpful. https://docs.python.org/3/library/json.html#module-json.tool

…

On Sun, Aug 29, 2021, 22:44 Andrew Darling ***@***.***> wrote: Hello, Thanks for the awesome tool! I would like to use it for a project I'm working on but have the following issue: NBFormat does not recognize indentation levels people set for their notebook's JSON file (more info: https://jupyter-notebook.readthedocs.io/en/latest/frontend_config.html#example-changing-the-notebook-s-default-indentation). This is a problem because when a notebook with more than one space for indentation is formatted by NBFormat, all lines are rewritten to use single spaced indentation. As a result, the git diff is unreadable. I propose the following two possible solutions (either would work): - NBFormat could parse how many spaces are used in a file and default to using that many spaces when writing JSON back to that file - NBFormat could accept an indent option within the nbformat.write function allowing users to specify how many spaces should be used I suspect the second possible solution would be easier and therefore the better option. *Problem Walkthrough* Original nb JSON (note the double-space indent): "cells": [ { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "locked": true, "schema_version": 3, "solution": false }, "id": "peZlh40I73Ql" }, "source": [ "# Deep Learning\n", "\n", "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up." ] }, ... } After a call to nbformat.write(nb=nb, fp=notebook_path, version=nbformat.NO_CONVERT), the following is the JSON output (note the single-space indent): "cells": [ { "cell_type": "markdown", "metadata": { "id": "peZlh40I73Ql", "nbgrader": { "grade": false, "locked": true, "schema_version": 3, "solution": false } }, "source": [ "# Deep Learning\n", "\n", "In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up." ] }, ... } Best, Andrew — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#228>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAMNS5ZQ3JYHPPOYPA2PQ3T7LWADANCNFSM5DA5Z4KA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Andrew Darling · Answer 2 · Mon Aug 30 2021 22:24:16 GMT+0800 (China Standard Time)

IMO, the fixed indentation reduces line noise in diffs. Avoiding line noise in diffs justifies setting the indentation level to a fixed indent.

Hmm... Now that you say this, I'm realizing just how much line noise two different developers with two different indentation settings could create. I concede that the better solution for my problem is to enforce a uniform indentation level across our project.

If the/a JSON parser was extended to support sniffing the (first few?) indentation levels, I still don't think nbformat should support configurable indentation levels.

Although possibly not necessary, I think parsing just the first line would work fine. I know this invites the argument that there may be a hidden line with different indention than the rest of the file that would cause line noise when formatted but I'm really not concerned with it.

If you need to reindent a JSON document to e.g. manually review the ipynb, python -m json.tool --indent 2 nb.ipynb may be helpful. https://docs.python.org/3/library/json.html#module-json.tool

Thanks for pointing this out! If we decide to go with an indentation level other than 1 and NBFormat does not add custom-indention support, I'll be sure to look into using this.

Andrew Darling · Answer 3 · Mon Aug 30 2021 22:28:12 GMT+0800 (China Standard Time)

I'm closing this Issue because I no longer feel I need an indentation feature

Wes Turner · Answer 4 · Mon Aug 30 2021 23:14:20 GMT+0800 (China Standard Time)

Rog. For my nbformat with JSON-LD @context designs, perhaps others would feel differently about 2, maybe 3, 4 spaces instead of tabs?

The notebook-aware diffing support in nbdime may be helpful for your use case?

From markusschanta/awesome-jupyter#8 :

nbdime - visual diff tool for Jupyter notebooks
https://nbdime.readthedocs.io/en/stable/
(works as a merge driver and a diff driver)
https://nbdime.readthedocs.io/en/stable/vcs.html#git-integration

https://github.com/Yogayu/awesome-jupyterlab-extension#version-control