Create cell id based on based on its content (source)?

Question

Create cell id based on based on its content (source)?

lexnederbragt opened this issue 3 years ago · comments

I work a lot with a markup language called DocOnce and its command line conversion tool (https://github.com/doconce/doconce). One of the possible output formats for doconce is Jupyter notebooks.

With the recently added cell ids, each time doconce generates a new version of a notebook from a doconce source file, all cell ids change. This is OK in itself, however, it causes a lot of 'noise' when having the notebook under version control and looking for differences between versions.

Would creating the cell id based on the cell's content be an option for doconce? In practice it would be generating a hash from the text in the (json) source field, rather than a random hash. When generating a new notebook, cells that do not change because the doconce source for it did not change would again get the same id. Cells that changed would get a new ID, which is fine when comparing (diffing) notebooks under version control.

My question is not whether it is technically possible on the doconce side, but whether it could lead to downstream problems...

Wes Turner · Answer 1 · Thu Sep 23 2021 23:15:57 GMT+0800 (China Standard Time)

#209
- jupyterlab/jupyterlab#9645 (comment)
  - jupyterlab/jupyterlab#10018
    - #217
      - #218
        
        https://github.com/jupyter/nbformat/blame/master/nbformat/corpus/words.py
        
        def generate_corpus_id(): return uuid.uuid4().hex[:8]
        
        AFAIU, there is no further schema restriction on the cell.id field? i.e. nothing will at runtime restrict the value assigned to or located in the cell IDs in an nbformat .ipynb json document?

Vidar Tonaas Fauske · Answer 2 · Tue Dec 21 2021 23:22:58 GMT+0800 (China Standard Time)

If two cells have identical content, you proposal would lead them to have identical IDs, which would not be allowed (each cell's ID need to be unique within the document).

Lex Nederbragt · Answer 3 · Mon Jan 03 2022 21:15:08 GMT+0800 (China Standard Time)

This is correct. My current implementation solved that by adding a running number cells with identical IDs. See https://github.com/doconce/doconce/pull/223/files#diff-7f024362fe22e3d1f64babebb05a2819ef408b8bdaddf4a0f6527ca492b5856cR753