nteract / papermill

πŸ“š Parameterize, execute, and analyze notebooks

Home Page:http://papermill.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support complex parameter inputs

rchui opened this issue Β· comments

πŸš€ Feature

The current method for injecting parameters into the Jupyter Notebooks requires that the input values be a valid JSON type else they are coerced into their object.__repr__() representation. papermill should support more complex data types via another protocol (pickle maybe?) for a better user experience

Motivation

When calling papermill from the command line, JSON types are adequate because it is difficult for the user to represent anything more complex than that. However when calling papermill programmatically, user defined types and classes are much easier to create and are not supported. Obviously there are workarounds such as writing to a file and reading the values back in on the other end, but it would be better if papermill could support something like this natively.

Hi @rchui,

The reason it doesn't do this by default is more a security problem than a capability. If you don't know the input source of your parameters you open up to arbitrary code injection from parameters that might be coming from an untrusted source. Basically the same as SQL injection security concerns.

That being said I'd be open to having an optional cloudpickle translator a user could opt-into that was registered in the OSS package. That would enable pandas dataframe transport assuming both the writer and the notebook environment both had the same version of cloudpickle and the inputs were trusted.

@MSeal Definitely echo those security concerns. I agree that the latter would be very powerful and in-line with what I'm looking for. We only use papermill as part of internal report generation so the inputs would always be trusted.