NBFormat Does Not Recognize Indentation
z3ht opened this issue · comments
Hello,
Thanks for the awesome tool! I would like to use it for a project I'm working on but have the following issue:
NBFormat does not recognize indentation levels people set for their notebook's JSON file (more info: https://jupyter-notebook.readthedocs.io/en/latest/frontend_config.html#example-changing-the-notebook-s-default-indentation). This is a problem because when a notebook with more than one space for indentation is formatted by NBFormat, all lines are rewritten to use single spaced indentation. As a result, the git diff is unreadable.
I propose the following two possible solutions (either would work):
- NBFormat could parse how many spaces are used in a file and default to using that many spaces when writing JSON back to that file
- NBFormat could accept an
indent
option within thenbformat.write
function allowing users to specify how many spaces should be used
I suspect the second possible solution would be easier and therefore the better option.
Problem Walkthrough
Original nb JSON (note the double-space indent):
"cells": [
{
"cell_type": "markdown",
"metadata": {
"nbgrader": {
"grade": false,
"locked": true,
"schema_version": 3,
"solution": false
},
"id": "peZlh40I73Ql"
},
"source": [
"# Deep Learning\n",
"\n",
"In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up."
]
},
...
}
After a call to nbformat.write(nb=nb, fp=notebook_path, version=nbformat.NO_CONVERT)
, the following is the JSON output (note the single-space indent):
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "peZlh40I73Ql",
"nbgrader": {
"grade": false,
"locked": true,
"schema_version": 3,
"solution": false
}
},
"source": [
"# Deep Learning\n",
"\n",
"In this exercise, you will use a deep neural network to predict the values of houses based on some provided input data. You will use keras to build the model. Below is a description of how the keras models are set up."
]
},
...
}
Best,
Andrew
IMO, the fixed indentation reduces line noise in diffs. Avoiding line noise in diffs justifies setting the indentation level to a fixed indent.
Hmm... Now that you say this, I'm realizing just how much line noise two different developers with two different indentation settings could create. I concede that the better solution for my problem is to enforce a uniform indentation level across our project.
If the/a JSON parser was extended to support sniffing the (first few?) indentation levels, I still don't think nbformat should support configurable indentation levels.
Although possibly not necessary, I think parsing just the first line would work fine. I know this invites the argument that there may be a hidden line with different indention than the rest of the file that would cause line noise when formatted but I'm really not concerned with it.
If you need to reindent a JSON document to e.g. manually review the ipynb,
python -m json.tool --indent 2 nb.ipynb
may be helpful. https://docs.python.org/3/library/json.html#module-json.tool
Thanks for pointing this out! If we decide to go with an indentation level other than 1 and NBFormat does not add custom-indention support, I'll be sure to look into using this.
I'm closing this Issue because I no longer feel I need an indentation feature
Rog. For my nbformat with JSON-LD @context designs, perhaps others would feel differently about 2, maybe 3, 4 spaces instead of tabs?
The notebook-aware diffing support in nbdime may be helpful for your use case?
From markusschanta/awesome-jupyter#8 :
- nbdime - visual diff tool for Jupyter notebooks
https://nbdime.readthedocs.io/en/stable/
(works as a merge driver and a diff driver)
https://nbdime.readthedocs.io/en/stable/vcs.html#git-integration
https://github.com/Yogayu/awesome-jupyterlab-extension#version-control