jupyter / nbformat

Reference implementation of the Jupyter Notebook format

Home Page:http://nbformat.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

base64 content encodings are unusable in security conscious organizations

jahess opened this issue · comments

Yes, integral base64 encodings for cell content is nice from network, efficiency, and file management aspects.

Unfortunately, security conscious organization that have established review processes simply will not allow base64 content -- it is too easy to hide content or malware that way from their established (read unchangeable) processes.

For example, if an ipynb file has a matplotlib produced png image, such an organization needs that image referenced by the .ipynb but stored in an external png file. This allows existing processes for approved file types to operate on the cell content/results.

The most standardized way to inline base64-encoded content with a MIME type in HTML is with data: URIs: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs

nbformat specifies MIME bundles. Is there any backward-compatible way to change that part of the specification? Wouldn't we then we creating a new packaging/archive format with it's own file extension that you'd carve out anyway?

It's possible to reference (file) URLs with notebooks, but - other than what is supported by e.g. repo2docker for e.g. binderhub - there is no standardized nbformat-unique package format; so, simple relative URLs will work until a notebook is moved to a new path. Jupyter-book may already support sphinx's make linkcheck with MyST Markdown and .ipynb notebooks?

If notebook authors choose to reference paths with relative paths in a git repo, that's great. Hopefully scanning tools support scanning other file formats that could all contain inlined data like .zip, .whl, .json, and .html.

For example, if an ipynb file has a matplotlib produced png image, such an organization needs that image referenced by the .ipynb but stored in an external png file. This allows existing processes for approved file types to operate on the cell content/results.

Save a matplotlib chart to an image and then display that:

import pandas as pd
df = pd.read_csv('./example.csv')
fig = df.plot().get_figure()

fig.savefig('./example.png')

from IPython.display import Image
Image(filename='./example.png')