jupyter / nbformat

Reference implementation of the Jupyter Notebook format

Home Page:http://nbformat.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ambiguous warning about missing cell IDs

krassowski opened this issue · comments

#282 refactored normalisation of missing cell IDs, removing lines:

notebook_supports_cell_ids = ref is None and version >= 4 and version_minor >= 5
if notebook_supports_cell_ids and repair_duplicate_cell_ids:
    # Auto-generate cell ids for cells that are missing them.
    for cell in nbdict["cells"]:
        if "id" not in cell:
            # Generate cell ids if any are missing
            cell["id"] = generate_corpus_id()

and replacing them with:

if (version, version_minor) >= (4, 5):
    # if we support cell ids ensure default ids are provided
    for cell in nbdict["cells"]:
        if "id" not in cell:
            warnings.warn(
                "Code cell is missing an id field, this will become"
                " a hard error in future nbformat versions. You may want"
                " to use `normalize()` on your notebooks before validations"
                " (available since nbformat 5.1.4). Previous versions of nbformat"
                " are fixing this issue transparently, and will stop doing so"
                " in the future.",
                MissingIDFieldWarning,
                stacklevel=3,
            )
            # Generate cell ids if any are missing
            if repair_duplicate_cell_ids:
                cell["id"] = generate_corpus_id()
                changes += 1

However,

  • during validation repair_duplicate_cell_ids is set to False, and in general I do not understand why it is reusing repair_duplicate_cell_ids - was it meant to be a new variable for missing cell IDs?
  • Previous versions of nbformat are fixing this issue transparently, and will stop doing so in the future - this sentence is ambiguous; does it mean that current version still fixes this issue transparently or not?

It's complicated. it should fix most of the issue, the problem being that the way it was written the use of 'if isvalid(notebook)' was returning true and mutating notebook to most of the time be valid.

It should be considered an error to pass an invalid notebook, but currently it is not because other we would break the world.

note that a notebook is not trusted after reload BECAUSE nbformat fixes it, the reason being the signature is compute before any call to normalise/isvalid and therefore the mutated notebook that is saved has a different signature and is thus untrusted.

note that a notebook is not trusted after reload BECAUSE nbformat fixes it, the reason being the signature is compute before any call

Yes that is plausible. However, it could also be that there is another bug in Notebook v7 beta and that is the cause of signature mismatch.

Sorry because I still don't follow: repair_duplicate_cell_ids is set to True if it is not provided because it defaults to _deprecated:

def validate( # noqa
nbdict: Any = None,
ref: Optional[str] = None,
version: Optional[int] = None,
version_minor: Optional[int] = None,
relax_add_props: bool = False,
nbjson: Any = None,
repair_duplicate_cell_ids: bool = _deprecated, # type: ignore

if repair_duplicate_cell_ids is _deprecated:
repair_duplicate_cell_ids = True

but, when it is used in isvalid(), repair_duplicate_cell_ids is force-set to False hence my line of reasoning:

def isvalid(nbjson, ref=None, version=None, version_minor=None):
"""Checks whether the given notebook JSON conforms to the current
notebook format schema. Returns True if the JSON is valid, and
False otherwise.
To see the individual errors that were encountered, please use the
`validate` function instead.
"""
orig = deepcopy(nbjson)
try:
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=MissingIDFieldWarning)
validate(nbjson, ref, version, version_minor, repair_duplicate_cell_ids=False)
except ValidationError:

Am I missing something here?

Also, the method which is used by jupyter-server to save notebooks also includes repair_duplicate_cell_ids=False:

def writes(nb, format="DEPRECATED", version=current_nbformat, **kwargs): # noqa
"""Write a notebook to a string in a given format in the current nbformat version.
This function always writes the notebook in the current nbformat version.
Parameters
----------
nb : NotebookNode
The notebook to write.
version : int
The nbformat version to write.
Used for downgrading notebooks.
Returns
-------
s : unicode
The notebook string.
"""
if format not in {"DEPRECATED", "json"}:
_warn_format()
nb = convert(nb, version)
try:
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=DeprecationWarning)
validate(nb, repair_duplicate_cell_ids=False)

And isvalid() call was removed from here in #282

Sure it is possible that there is another bug, and I think that having hard errors on nbformat would make it easier to detect.

In general it would be better to have a nbformat API that is difficult to miss-use.

I'll try to find some time to dive into the code and find the reasons. I don't quite remember all the API.

Is there already a way to make the nbformat API strict ? Or add warning filter during the notebook v7 beta that turn warnings into errors ?

and I think that having hard errors on nbformat would make it easier to detect. In general it would be better to have a nbformat API that is difficult to miss-use.

👍

Is there already a way to make the nbformat API strict ? Or add warning filter during the notebook v7 beta that turn warnings into errors ?

jupyter-server has a pre_save_hook which I was successfully using to debug trust issues. For the next beta of JupyterLab and Notebook one could add a pre-save hook that would validate the notebook JSON and error out if it was faulty.

What do you think?

I think a pre-save hook is fine. usually I think the stricter we are in beta by default the better it is as you can always relax for RC/releases.

The warning "Code cell is missing an id field..." should actually say " Cell is missing an id field.." since it not just 'code' cells but any type of cell. I have changed the offending cell's cell type to 'raw' and it still generates the warning,

I edited validator.py to show the cell index and cell type in the warning message to help track offending cell

if (version, version_minor) >= (4, 5):
    # if we support cell ids ensure default ids are provided
    for i,cell in enumerate(nbdict["cells"]):
        if "id" not in cell:
            warnings.warn(
                f"Cell index: {i} cell_type {cell.cell_type} {cell.source[0:50]} \n" ".Cell is missing an id field, this will become"
                " a hard error in future nbformat versions. You may want"
                " to use `normalize()` on your notebooks before validations"
                " (available since nbformat 5.1.4). Previous versions of nbformat"
                " are fixing this issue transparently, and will stop doing so"
                f" in the future. ",
                MissingIDFieldWarning,
                stacklevel=3,
            )

I miss the point in inserting this warning into the normalize() function.
I'm using normalize() to get rid of this warning but it's actually not possible due to this behavior.

normalize is a workaround – it was added to have a path forward – in general you should pass a valid notebook into those function and normalize should be no-op.

And it is in the doc of normalize:

def normalize(...):
    """
...

    You should in general not rely on this function and make sure the notebooks
    that reach nbformat are already in a normal form. If not you likely have a bug,
    and may have security issues.
    """

Normalize will try to fix your notebook, but still warn you if the notebook are invalid.